Soham Digambar, Jovi Mahajan, Sai Subramanian
Motivation
It is difficult to accurately compare the salaries of NBA players based on the college they attended because there are many factors that can influence a player's salary, such as their skill level, their experience, and their marketability. Additionally, the popularity of a college's basketball program does not necessarily have a direct correlation with the success or salaries of its alumni in the NBA.
That being said, it is worth noting that some colleges have a history of producing successful NBA players and may be seen as a breeding ground for future professionals. For example, Duke University and the University of Kentucky are both well-known for their successful basketball programs and have produced a number of high-profile NBA players over the years. On the other hand, there are also many talented players who come from less well-known colleges and still go on to have successful careers in the NBA.
Thus, an analysis like this is still important in the scope of the future of the NBA and NCAA. We want the ability to share the pipeline of data science to more people and think that this tutorial will be a good beginning!
Goal
In this tutorial, we aim to find if the most-attended NCAA schools typically lead their NBA-bound players to the higher-paid salaries compared to the least-attended NCAA schools. While we recognize that "success" in the NBA isn't purely defined by financial compensation, it is a great motivator for many aspiring basketball players wanting to enter the NBA.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from collections import defaultdict
from scipy.stats import ttest_ind
import folium
import requests
from bs4 import BeautifulSoup
import urllib.parse
import statsmodels.api as sm
In order to begin this process, we must grab a few datasets that are relavant and accessible. We have decided to use the results of this survey, obtained from the enterprise data catalog website data.world, throughout this tutorial.
The first dataset, found here, contains information about NBA drafts from 1989 to 2016. It includes information about the year of the draft, the round, the pick number, the team that made the pick, the player who was drafted, and their college (if applicable). The dataset also includes statistical information about the player's career in the NBA, including the number of years they played, the number of games they played in, the total number of minutes they played, the total number of points they scored, the total number of rebounds they had, and the total number of assists they had. There is one row for each pick in each draft, so the dataset includes information about every player who was drafted during this time period.
The second dataset, found here, contains information about NBA players and their salaries from 1996 to 2017. It includes the player's name, team, salary for the year, position on the court, age, and statistical data such as games played, minutes played, points scored, assists, and rebounds. The dataset has one row for each player in each year, so it includes salary and statistical data for all players in the NBA over this time period. The data can be used to understand trends in player salaries and performance over time, as well as to compare the salaries and performance of different players.
In order to maintain a certain time frame, we will cut the timeframe for both datasets between 1996-2016.
# downloaded .xlsx from data.world
# 1. https://data.world/nolanoreilly495/nba-data-with-salaries-1996-2017/workspace/file?filename=NBA+Data+With+Salaries.xlsx
# 2. https://data.world/gmoney/nba-drafts-2016-1989
salaries = pd.read_excel("/content/NBA Data With Salaries.xlsx") # NBA Data with salary Compensation from 1996-2017
drafts = pd.read_excel("/content/NBA Drafts.xlsx") # NBA Draft Data from 1989-2016
# We can drop all the rows where there is no salary information
salaries = salaries[salaries.Salary != 0]
# cut the timeframe from 1996-2016 and merge back together
salaries1 = salaries.loc[salaries['Year'] >= 1996, list(salaries.columns)]
salaries2 = salaries.loc[salaries['Year'] <= 2016, list(salaries.columns)]
salaries = pd.merge(salaries1, salaries2)
salaries
| Player | Year | Tm | Salary | Pos | Age | G | GS | PER | TS% | ... | FTA/G | ORB/G | DRB/G | TRB/G | AST/G | STL/G | BLK/G | TOV/G | PF/G | PTS/G | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Al Horford | 2016 | ATL | 12000000 | C | 29 | 82 | 82 | 19.4 | 0.565 | ... | 1.573171 | 1.804878 | 5.463415 | 7.268293 | 3.207317 | 0.829268 | 1.475610 | 1.304878 | 1.987805 | 15.231707 |
| 1 | Tiago Splitter | 2016 | ATL | 8500000 | C | 31 | 36 | 2 | 13.7 | 0.571 | ... | 1.333333 | 1.250000 | 2.083333 | 3.333333 | 0.833333 | 0.555556 | 0.333333 | 0.666667 | 1.972222 | 5.583333 |
| 2 | Jeff Teague | 2016 | ATL | 8000000 | PG | 27 | 79 | 78 | 17.9 | 0.551 | ... | 3.949367 | 0.417722 | 2.291139 | 2.708861 | 5.949367 | 1.227848 | 0.303797 | 2.759494 | 2.113924 | 15.683544 |
| 3 | Kyle Korver | 2016 | ATL | 5746479 | SG | 34 | 80 | 80 | 9.7 | 0.578 | ... | 0.675000 | 0.175000 | 3.075000 | 3.250000 | 2.050000 | 0.750000 | 0.437500 | 1.225000 | 2.012500 | 9.237500 |
| 4 | Thabo Sefolosha | 2016 | ATL | 4000000 | SF | 31 | 75 | 11 | 12.4 | 0.578 | ... | 1.320000 | 0.693333 | 3.760000 | 4.453333 | 1.426667 | 1.133333 | 0.493333 | 0.906667 | 1.453333 | 6.400000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 8909 | Jim McIlvaine | 1996 | WAS | 525000 | C | 23 | 80 | 6 | 9.7 | 0.476 | ... | 1.312500 | 0.825000 | 2.050000 | 2.875000 | 0.137500 | 0.262500 | 2.075000 | 0.450000 | 2.137500 | 2.275000 |
| 8910 | Mitchell Butler | 1996 | WAS | 500000 | SG | 25 | 61 | 3 | 7.4 | 0.446 | ... | 1.360656 | 0.475410 | 1.459016 | 1.934426 | 1.098361 | 0.672131 | 0.196721 | 1.098361 | 1.704918 | 3.885246 |
| 8911 | Bob McCann | 1996 | WAS | 300000 | PF | 31 | 62 | 0 | 9.1 | 0.507 | ... | 1.193548 | 0.741935 | 1.564516 | 2.306452 | 0.387097 | 0.338710 | 0.241935 | 0.677419 | 1.870968 | 3.032258 |
| 8912 | Brent Price | 1996 | WAS | 250000 | PG | 27 | 81 | 50 | 17.5 | 0.655 | ... | 2.358025 | 0.469136 | 2.345679 | 2.814815 | 5.135802 | 0.962963 | 0.049383 | 1.888889 | 2.271605 | 10.000000 |
| 8913 | Tim Legler | 1996 | WAS | 250000 | SG | 29 | 77 | 0 | 15.9 | 0.688 | ... | 1.987013 | 0.376623 | 1.441558 | 1.818182 | 1.766234 | 0.584416 | 0.155844 | 0.584416 | 1.831169 | 9.428571 |
8914 rows × 51 columns
drafts1 = drafts.loc[drafts['Draft Year'] >= 1996, list(drafts.columns)]
drafts2 = drafts.loc[drafts['Draft Year'] <= 2016, list(drafts.columns)]
drafts = pd.merge(drafts1, drafts2)
drafts['Draft Year'] = drafts['Draft Year'].apply(lambda x: int(x))
drafts
| Draft Year | Round | Pick | Team | Player | College | Yrs | G | MP | PTS | ... | 3P% | FT% | Minuts Played | Points | Total Rebounds | Assists | Win Share | Win SharesS/48 | Box Plus/Minus | Value Over Replacement | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2016 | 1 | 1.0 | PHI | Ben Simmons | Louisiana State University | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2016 | 1 | 2.0 | LAL | Brandon Ingram | Duke University | 1.0 | 42.0 | 1157.0 | 335.0 | ... | 0.278 | 0.691 | 27.5 | 8.0 | 4.1 | 2.0 | -0.2 | -0.006 | -4.3 | -0.7 |
| 2 | 2016 | 1 | 3.0 | BOS | Jaylen Brown | University of California | 1.0 | 38.0 | 497.0 | 180.0 | ... | 0.325 | 0.641 | 13.1 | 4.7 | 1.9 | 0.6 | 0.3 | 0.029 | -4.8 | -0.3 |
| 3 | 2016 | 1 | 4.0 | PHO | Dragan Bender | NaN | 1.0 | 31.0 | 382.0 | 97.0 | ... | 0.317 | 0.167 | 12.3 | 3.1 | 2.0 | 0.4 | -0.1 | -0.008 | -3.8 | -0.2 |
| 4 | 2016 | 1 | 5.0 | MIN | Kris Dunn | Providence College | 1.0 | 38.0 | 643.0 | 150.0 | ... | 0.294 | 0.619 | 16.9 | 3.9 | 2.2 | 2.5 | 0.2 | 0.012 | -2.5 | -0.1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1235 | 1996 | 2 | 54.0 | UTA | Shandon Anderson | University of Georgia | 10.0 | 719.0 | 15946.0 | 5327.0 | ... | 0.316 | 0.739 | 22.2 | 7.4 | 3.1 | 1.4 | 23.6 | 0.071 | -0.8 | 4.9 |
| 1236 | 1996 | 2 | 55.0 | WSB | Ronnie Henderson | Louisiana State University | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1237 | 1996 | 2 | 56.0 | CLE | Reggie Geary | University of Arizona | 2.0 | 101.0 | 931.0 | 209.0 | ... | 0.328 | 0.493 | 9.2 | 2.1 | 0.8 | 1.1 | 0.6 | 0.032 | -3.2 | -0.3 |
| 1238 | 1996 | 2 | 57.0 | SEA | Drew Barry | Georgia Institute of Technology | 3.0 | 60.0 | 598.0 | 134.0 | ... | 0.381 | 0.774 | 10.0 | 2.2 | 1.1 | 1.9 | 0.6 | 0.051 | -4.5 | -0.4 |
| 1239 | 1996 | 2 | 58.0 | DAL | Darnell Robinson | University of Arkansas | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1240 rows × 23 columns
In some instances, the drafted player did not come from a college, and came from a High school instead. A clear example of this can be seen with LeBron James or Kobe Bryant, who were both drafted into the NBA right after their senior year of high school. Let's take a look at what the college cell value is for LeBron and Kobe.
leBron = drafts.loc[drafts["Player"] == "LeBron James"]
leBron
| Draft Year | Round | Pick | Team | Player | College | Yrs | G | MP | PTS | ... | 3P% | FT% | Minuts Played | Points | Total Rebounds | Assists | Win Share | Win SharesS/48 | Box Plus/Minus | Value Over Replacement | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 779 | 2003 | 1 | 1.0 | CLE | LeBron James | NaN | 14.0 | 1022.0 | 39772.0 | 27746.0 | ... | 0.341 | 0.743 | 38.9 | 27.1 | 7.2 | 6.9 | 198.6 | 0.24 | 9.2 | 112.2 |
1 rows × 23 columns
kobe = drafts.loc[drafts["Player"] == "Kobe Bryant"]
kobe
| Draft Year | Round | Pick | Team | Player | College | Yrs | G | MP | PTS | ... | 3P% | FT% | Minuts Played | Points | Total Rebounds | Assists | Win Share | Win SharesS/48 | Box Plus/Minus | Value Over Replacement | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1194 | 1996 | 1 | 13.0 | CHH | Kobe Bryant | NaN | 20.0 | 1346.0 | 48637.0 | 33643.0 | ... | 0.329 | 0.837 | 36.1 | 25.0 | 5.2 | 4.7 | 172.7 | 0.17 | 3.9 | 72.1 |
1 rows × 23 columns
It looks like the current value for a player with no NCAA experience is a NaN, which is not too descriptive for us.
There are a few other cases as well, where the player was drafted into the NBA from the G-League. Just to maintain consistency for the future, we can mark all these NaN's as "No College" due to the fact that it handles a majority of out inconsistent NaN's.
# cleaning all the drafted players that were drafted with no college degree
drafts["College"].fillna("No College", inplace=True)
drafts
| Draft Year | Round | Pick | Team | Player | College | Yrs | G | MP | PTS | ... | 3P% | FT% | Minuts Played | Points | Total Rebounds | Assists | Win Share | Win SharesS/48 | Box Plus/Minus | Value Over Replacement | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2016 | 1 | 1.0 | PHI | Ben Simmons | Louisiana State University | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2016 | 1 | 2.0 | LAL | Brandon Ingram | Duke University | 1.0 | 42.0 | 1157.0 | 335.0 | ... | 0.278 | 0.691 | 27.5 | 8.0 | 4.1 | 2.0 | -0.2 | -0.006 | -4.3 | -0.7 |
| 2 | 2016 | 1 | 3.0 | BOS | Jaylen Brown | University of California | 1.0 | 38.0 | 497.0 | 180.0 | ... | 0.325 | 0.641 | 13.1 | 4.7 | 1.9 | 0.6 | 0.3 | 0.029 | -4.8 | -0.3 |
| 3 | 2016 | 1 | 4.0 | PHO | Dragan Bender | No College | 1.0 | 31.0 | 382.0 | 97.0 | ... | 0.317 | 0.167 | 12.3 | 3.1 | 2.0 | 0.4 | -0.1 | -0.008 | -3.8 | -0.2 |
| 4 | 2016 | 1 | 5.0 | MIN | Kris Dunn | Providence College | 1.0 | 38.0 | 643.0 | 150.0 | ... | 0.294 | 0.619 | 16.9 | 3.9 | 2.2 | 2.5 | 0.2 | 0.012 | -2.5 | -0.1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1235 | 1996 | 2 | 54.0 | UTA | Shandon Anderson | University of Georgia | 10.0 | 719.0 | 15946.0 | 5327.0 | ... | 0.316 | 0.739 | 22.2 | 7.4 | 3.1 | 1.4 | 23.6 | 0.071 | -0.8 | 4.9 |
| 1236 | 1996 | 2 | 55.0 | WSB | Ronnie Henderson | Louisiana State University | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1237 | 1996 | 2 | 56.0 | CLE | Reggie Geary | University of Arizona | 2.0 | 101.0 | 931.0 | 209.0 | ... | 0.328 | 0.493 | 9.2 | 2.1 | 0.8 | 1.1 | 0.6 | 0.032 | -3.2 | -0.3 |
| 1238 | 1996 | 2 | 57.0 | SEA | Drew Barry | Georgia Institute of Technology | 3.0 | 60.0 | 598.0 | 134.0 | ... | 0.381 | 0.774 | 10.0 | 2.2 | 1.1 | 1.9 | 0.6 | 0.051 | -4.5 | -0.4 |
| 1239 | 1996 | 2 | 58.0 | DAL | Darnell Robinson | University of Arkansas | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1240 rows × 23 columns
We now have two separate dataframes with information about their players, so we can merge them in order to get a single dataframe that we can use in the future. To do this, we can first select all of the rows in the drafts dataframe that are also in the salaries dataframe.
There is a lot of information that won't be needed in the drafts dataframe, so let's just select what we need. The important columns that we will be picking from include the player name, college, and draft year.
We can merge them back together to the salaries dataframe to have a singular dataframe that we can use.
We will also reset the indices and create a new column for the number of years the player has been in the NBA. This statistics might use later on.
# selecting necessary rows
selected_rows = drafts.loc[drafts['Player'].isin(list(salaries['Player'])), ["Player", "College", "Draft Year"]]
# merging and dropping any repeated rows
salaries = pd.merge(salaries, selected_rows, on="Player")
salaries = salaries.drop_duplicates(subset=['Player'])
salaries["College"].fillna("Unknown", inplace=True)
# calculation experiences column
salaries["Experience"] = salaries['Year'] - salaries["Draft Year"]
salaries = salaries.reset_index()
salaries
| index | Player | Year | Tm | Salary | Pos | Age | G | GS | PER | ... | TRB/G | AST/G | STL/G | BLK/G | TOV/G | PF/G | PTS/G | College | Draft Year | Experience | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Al Horford | 2016 | ATL | 12000000 | C | 29 | 82 | 82 | 19.4 | ... | 7.268293 | 3.207317 | 0.829268 | 1.475610 | 1.304878 | 1.987805 | 15.231707 | University of Florida | 2007 | 9 |
| 1 | 9 | Tiago Splitter | 2016 | ATL | 8500000 | C | 31 | 36 | 2 | 13.7 | ... | 3.333333 | 0.833333 | 0.555556 | 0.333333 | 0.666667 | 1.972222 | 5.583333 | No College | 2007 | 9 |
| 2 | 15 | Jeff Teague | 2016 | ATL | 8000000 | PG | 27 | 79 | 78 | 17.9 | ... | 2.708861 | 5.949367 | 1.227848 | 0.303797 | 2.759494 | 2.113924 | 15.683544 | Wake Forest University | 2009 | 7 |
| 3 | 22 | Kyle Korver | 2016 | ATL | 5746479 | SG | 34 | 80 | 80 | 9.7 | ... | 3.250000 | 2.050000 | 0.750000 | 0.437500 | 1.225000 | 2.012500 | 9.237500 | Creighton University | 2003 | 13 |
| 4 | 35 | Thabo Sefolosha | 2016 | ATL | 4000000 | SF | 31 | 75 | 11 | 12.4 | ... | 4.453333 | 1.426667 | 1.133333 | 0.493333 | 0.906667 | 1.453333 | 6.400000 | No College | 2006 | 10 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 949 | 5486 | Chris Singleton | 2014 | WAS | 1618680 | SF | 24 | 25 | 0 | 8.8 | ... | 2.200000 | 0.240000 | 0.360000 | 0.120000 | 0.680000 | 1.000000 | 3.000000 | Florida State University | 2011 | 3 |
| 950 | 5489 | Hamady N'Diaye | 2012 | WAS | 270427 | C | 25 | 3 | 0 | -13.1 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | Rutgers University | 2010 | 2 |
| 951 | 5491 | James Lang | 2007 | WAS | 771331 | C | 23 | 11 | 0 | 6.6 | ... | 1.000000 | 0.181818 | 0.000000 | 0.272727 | 0.181818 | 1.272727 | 1.000000 | No College | 2003 | 4 |
| 952 | 5492 | Mike Smith | 2001 | WAS | 316969 | SF | 24 | 17 | 0 | 8.1 | ... | 1.294118 | 0.588235 | 0.294118 | 0.176471 | 0.411765 | 0.470588 | 3.000000 | University of Louisiana at Monroe | 2000 | 1 |
| 953 | 5493 | God Shammgod | 1998 | WAS | 242250 | PG | 21 | 20 | 0 | 9.2 | ... | 0.350000 | 1.800000 | 0.350000 | 0.050000 | 1.050000 | 1.350000 | 3.050000 | Providence College | 1997 | 1 |
954 rows × 55 columns
We now finally have a dataframe that we can use. Let's simply take a look at the columns that we will be working with and drop the unnecessary ones that were still in the salaries dataframe.
salaries.columns
Index(['index', 'Player', 'Year', 'Tm', 'Salary', 'Pos', 'Age', 'G', 'GS',
'PER', 'TS%', '3PAr', 'FTr', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'STL%',
'BLK%', 'TOV%', 'USG%', 'OWS', 'DWS', 'WS', 'WS/48', 'OBPM', 'DBPM',
'BPM', 'VORP', 'FG%', '3P%', '2P%', 'eFG%', 'FT%', 'MP/G', 'FG/G',
'FGA/G', '3P/G', '3PA/G', '2P/G', '2PA/G', 'FT/G', 'FTA/G', 'ORB/G',
'DRB/G', 'TRB/G', 'AST/G', 'STL/G', 'BLK/G', 'TOV/G', 'PF/G', 'PTS/G',
'College', 'Draft Year', 'Experience'],
dtype='object')
We don't really have a use for a lot of these columns. So, let's get rid of them.
salaries = salaries.drop(columns=['index', 'USG%', 'OWS', 'DWS', 'WS', 'WS/48', 'OBPM', 'DBPM', 'BPM', 'VORP', 'TRB%',"TRB/G"])
salaries
| Player | Year | Tm | Salary | Pos | Age | G | GS | PER | TS% | ... | DRB/G | AST/G | STL/G | BLK/G | TOV/G | PF/G | PTS/G | College | Draft Year | Experience | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Al Horford | 2016 | ATL | 12000000 | C | 29 | 82 | 82 | 19.4 | 0.565 | ... | 5.463415 | 3.207317 | 0.829268 | 1.475610 | 1.304878 | 1.987805 | 15.231707 | University of Florida | 2007 | 9 |
| 1 | Tiago Splitter | 2016 | ATL | 8500000 | C | 31 | 36 | 2 | 13.7 | 0.571 | ... | 2.083333 | 0.833333 | 0.555556 | 0.333333 | 0.666667 | 1.972222 | 5.583333 | No College | 2007 | 9 |
| 2 | Jeff Teague | 2016 | ATL | 8000000 | PG | 27 | 79 | 78 | 17.9 | 0.551 | ... | 2.291139 | 5.949367 | 1.227848 | 0.303797 | 2.759494 | 2.113924 | 15.683544 | Wake Forest University | 2009 | 7 |
| 3 | Kyle Korver | 2016 | ATL | 5746479 | SG | 34 | 80 | 80 | 9.7 | 0.578 | ... | 3.075000 | 2.050000 | 0.750000 | 0.437500 | 1.225000 | 2.012500 | 9.237500 | Creighton University | 2003 | 13 |
| 4 | Thabo Sefolosha | 2016 | ATL | 4000000 | SF | 31 | 75 | 11 | 12.4 | 0.578 | ... | 3.760000 | 1.426667 | 1.133333 | 0.493333 | 0.906667 | 1.453333 | 6.400000 | No College | 2006 | 10 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 949 | Chris Singleton | 2014 | WAS | 1618680 | SF | 24 | 25 | 0 | 8.8 | 0.481 | ... | 1.480000 | 0.240000 | 0.360000 | 0.120000 | 0.680000 | 1.000000 | 3.000000 | Florida State University | 2011 | 3 |
| 950 | Hamady N'Diaye | 2012 | WAS | 270427 | C | 25 | 3 | 0 | -13.1 | 0.000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | Rutgers University | 2010 | 2 |
| 951 | James Lang | 2007 | WAS | 771331 | C | 23 | 11 | 0 | 6.6 | 0.491 | ... | 0.545455 | 0.181818 | 0.000000 | 0.272727 | 0.181818 | 1.272727 | 1.000000 | No College | 2003 | 4 |
| 952 | Mike Smith | 2001 | WAS | 316969 | SF | 24 | 17 | 0 | 8.1 | 0.386 | ... | 0.705882 | 0.588235 | 0.294118 | 0.176471 | 0.411765 | 0.470588 | 3.000000 | University of Louisiana at Monroe | 2000 | 1 |
| 953 | God Shammgod | 1998 | WAS | 242250 | PG | 21 | 20 | 0 | 9.2 | 0.428 | ... | 0.250000 | 1.800000 | 0.350000 | 0.050000 | 1.050000 | 1.350000 | 3.050000 | Providence College | 1997 | 1 |
954 rows × 43 columns
We now have a dataframe with enough data to start visualizing what we want to.
Seaborn is a Python data visualization library that is built on top of Matplotlib and provides a high-level interface for creating a variety of statistical plots. To create a visualization showing the relationship between college and salary using Seaborn, you could use the barplot function and group the data by college to create separate bars for each college.
# Group the data by college and calculate the mean salary for each group
grouped = salaries.groupby('College')['Salary'].mean().reset_index()
# Create a bar chart
sns.barplot(x="College", y='Salary', data=grouped)
# Add a title
plt.title('Mean Salary by College')
# scaling and rotating x-labels
sns.set(font_scale=7.5)
plt.gcf().set_size_inches(200,180)
plt.xticks(rotation=90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
169, 170, 171, 172, 173]),
<a list of 174 Text major ticklabel objects>)
This code loads the data from the URL into a Pandas DataFrame, groups the data by college, and calculates the mean salary for each group. It then uses Seaborn to create a bar chart showing the mean salary for each college. The x-axis of the chart shows the college names, and the y-axis shows the mean salary.
Hmmm... This bar graph shows the mean annual salary of each college during the timeframe; however, some of the results are unexpected. For example, Saint Louis University has an extremely high average salary for its NBA players compared to a much bigger basketball school like University of Kentucky
Let's take a quick peek at Saint Louis University to see the information behind the players that went there.
grouped = salaries.groupby('College')
grouped.get_group("Saint Louis University")
| Player | Year | Tm | Salary | Pos | Age | G | GS | PER | TS% | ... | DRB/G | AST/G | STL/G | BLK/G | TOV/G | PF/G | PTS/G | College | Draft Year | Experience | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 265 | Larry Hughes | 2008 | CHI | 12000084 | SG | 29 | 28 | 25 | 12.4 | 0.469 | ... | 2.678571 | 3.107143 | 1.392857 | 0.214286 | 1.571429 | 1.714286 | 12.0 | Saint Louis University | 1998 | 10 |
1 rows × 43 columns
As we can now tell, it looks like the reason Saint Louis University have such a high average salary is due to the fact that there is only one player that has went there that we know of. This player is Larry Hughes who averaged $12,000,084 after 10 years of being in the league.
Let's see what the number of players looks like for the rest of the colleges while we are it
# Make a dictionary with keys as colleges and values as the number of attendees
colleges_count = defaultdict(int)
colleges = set(list(salaries["College"]))
# counting colleges
for i, row in salaries.iterrows():
colleges_count[row["College"]] += 1
# sorting dict by value
colleges_count = sorted(colleges_count.items(), key=lambda x: x[1])
colleges_count
[('Bucknell University', 1),
('Louisiana Tech University', 1),
('Shaw University', 1),
('University of Nebraska', 1),
('Southeastern Illinois College', 1),
('Butler County Community College', 1),
('Old Dominion University', 1),
('Central State University', 1),
('Georgia State University', 1),
('University of Tennessee at Martin', 1),
('Miami University', 1),
('University of the Pacific', 1),
('Ohio University', 1),
('Okaloosa-Walton Community College', 1),
('Wright State University', 1),
('University of Idaho', 1),
('Eastern Michigan University', 1),
('Northwestern University', 1),
('Saint Louis University', 1),
('Valparaiso University', 1),
('Central Michigan University', 1),
('University of Arkansas at Little Rock', 1),
('Augsburg College', 1),
('Pennsylvania State University', 1),
('Santa Clara University', 1),
('San Jose State University', 1),
('Morehead State University', 1),
('Virginia Polytechnic Institute and State University', 1),
('University of Tennessee at Chattanooga', 1),
('University of Richmond', 1),
('Eastern Washington University', 1),
('University of Alabama at Birmingham', 1),
('Davidson College', 1),
('Rider University', 1),
('Colgate University', 1),
('Manhattan College', 1),
('College of William & Mary', 1),
('Kansas State University', 1),
('Western Carolina University', 1),
('University of Central Florida', 1),
('Ball State University', 1),
('University of West Florida', 1),
('University of Houston', 1),
('Indiana University-Purdue University Indianapolis', 1),
('University of California, Santa Barbara', 1),
('Texas State University', 1),
('La Salle University', 1),
('University of Texas at El Paso', 1),
('Cleveland State University', 1),
('University of North Dakota', 1),
('Indian Hills Community College', 1),
('South Dakota State University', 1),
('Tulane University', 1),
('University of Wisconsin-Green Bay', 1),
('Colorado State University', 1),
('Texas Christian University', 1),
('Central Connecticut State University', 1),
('Drexel University', 1),
('Norfolk State University', 1),
('Wichita State University', 1),
('Florida Agricultural and Mechanical University', 1),
('Northeast Mississippi Community College', 1),
('St. Bonaventure University', 1),
('Southern Methodist University', 1),
('University of Toledo', 1),
('California State University, Bakersfield', 1),
('Weber State University', 1),
('Lehigh University', 1),
('California State University, Fullerton', 1),
('Barton County Community College', 1),
('Tennessee Technological University', 1),
('Rice University', 1),
('Walsh University', 1),
('University of Louisiana at Monroe', 1),
('Butler University', 2),
('Oregon State University', 2),
('University of Detroit Mercy', 2),
('Hofstra University', 2),
('University of South Florida', 2),
('College of Charleston', 2),
("Saint Joseph's University", 2),
('University of Iowa', 2),
('University of Mississippi', 2),
('University of Louisiana at Lafayette', 2),
('University of Tulsa', 2),
('Austin Peay State University', 2),
('Pepperdine University', 2),
('Marshall University', 2),
('West Virginia University', 2),
('Bradley University', 2),
('University of South Carolina', 2),
('Seton Hall University', 2),
('University of Rhode Island', 2),
('University of North Carolina at Charlotte', 2),
('California State University, Long Beach', 2),
('Bowling Green State University', 2),
('Virginia Commonwealth University', 2),
('Rutgers University', 2),
('Creighton University', 3),
('Brigham Young University', 3),
('Western Kentucky University', 3),
('Arizona State University', 3),
('Texas Tech University', 3),
('Temple University', 3),
('Washington State University', 3),
('San Diego State University', 3),
('University of Massachusetts Amherst', 3),
('Murray State University', 3),
('Auburn University', 3),
('Vanderbilt University', 4),
('Mississippi State University', 4),
('Texas A&M University', 4),
('DePaul University', 4),
('University of Oklahoma', 4),
('Clemson University', 4),
("St. John's University", 4),
('University of Virginia', 5),
('University of Arkansas', 5),
('University of Utah', 5),
('University of Alabama', 5),
('Providence College', 5),
('Purdue University', 5),
('University of Nevada, Las Vegas', 5),
('University of Colorado', 5),
('University of Miami', 5),
('Boston College', 5),
('University of Oregon', 5),
('University of Minnesota', 6),
('University of Missouri', 6),
('Gonzaga University', 6),
('University of Wisconsin', 6),
('Xavier University', 6),
('University of Georgia', 6),
('Baylor University', 6),
('North Carolina State University', 6),
('University of Nevada, Reno', 6),
('University of Notre Dame', 6),
('Wake Forest University', 7),
('Oklahoma State University', 7),
('University of New Mexico', 7),
('University of Tennessee', 7),
('Iowa State University', 7),
('University of Pittsburgh', 8),
('University of Cincinnati', 8),
('University of Illinois at Urbana-Champaign', 8),
('University of California', 9),
('Marquette University', 9),
('Florida State University', 9),
('University of Southern California', 9),
('California State University, Fresno', 9),
('University of Michigan', 10),
('Ohio State University', 10),
('University of Louisville', 10),
('Louisiana State University', 10),
('Villanova University', 10),
('University of Washington', 11),
('Indiana University', 11),
('University of Memphis', 12),
('Georgetown University', 12),
('Michigan State University', 12),
('University of Maryland', 13),
('Stanford University', 14),
('Georgia Institute of Technology', 14),
('University of Florida', 15),
('University of Texas at Austin', 16),
('University of Connecticut', 18),
('Syracuse University', 19),
('University of North Carolina', 23),
('University of Kansas', 26),
('University of Arizona', 26),
('University of California, Los Angeles', 26),
('Duke University', 29),
('University of Kentucky', 35),
('No College', 170)]
And let's graph it on a barplot while we are here
colleges = []
counts = []
# separating the tuples into 2 lists and then plotting
for college, count in colleges_count:
colleges.append(college)
counts.append(count)
# plotting
sns.barplot(x=colleges, y=counts)
sns.set(font_scale=7.5)
plt.gcf().set_size_inches(200,180)
plt.xticks(rotation=90)
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103,
104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
169, 170, 171, 172, 173]),
<a list of 174 Text major ticklabel objects>)
As we can see from the barplot above, it seems as though "No College" was the highest with around 170 players not going to a highschool. Let's take a look at the top 5 attended schools and bottom 5 attended schools.
Just as some additional visual anaysis, why don't we plot the count of players that went to the NBA from these selected NCAA schools.
In order to do this, we will take advantage of the Folium library to map the center of the USA and scale the approximate number of people that attended each school.
The scale will be based on attendee count and will be as follows:
1 person: GREEN
2 <= count <= 10: BLUE
11 <= count <= 35: PURPLE
36 <= count: RED
Why don't we create the map and center it around the USA.
# centering map in the center of USA (39.8282, -98.5795)
map_osm = folium.Map(location=[39.8282, -98.5795], zoom_start=4, tiles="OpenStreetMap")
Before we can plot these points on the map, we need to get the longitude and latitudes for each college.
We can easily do this by using the Geocoding API from Google! First, we will the API to obtain the longitude and latitude coordinates for a location. The Google Maps Geocoding API is a web service that allows you to convert addresses (such as "1600 Amphitheatre Parkway, Mountain View, CA") into geographic coordinates (such as latitude 37.423021 and longitude -122.083739), and vice versa.
To use the Google Maps Geocoding API on your own time, you will need to sign up for a Google Cloud account and obtain an API key. You can then use the API key to make requests to the API and retrieve the longitude and latitude coordinates for a location. This open source API is free to use! For now, we will be taking advantage of the authentication I have to get the information that we need.
def get_long_lat(school):
# personal api_key -> to get your own, you will need to sign up for a Google Cloud account
api_key = "AIzaSyDDYukdNLuVRNqSPS329i6lkAsEqCIKkiw"
# querying geographical location info
api_response = requests.get('https://maps.googleapis.com/maps/api/geocode/json?address={0}&key={1}'.format(school, api_key))
api_response_dict = api_response.json()
latitude = longitude = "NULL"
# getting longitude and latitude
if api_response_dict['status'] == 'OK':
latitude = api_response_dict['results'][0]['geometry']['location']['lat']
longitude = api_response_dict['results'][0]['geometry']['location']['lng']
return (longitude, latitude)
for college, count in colleges_count:
if college == "No College":
continue
# getting longitude and latitude
data = get_long_lat(college)
# plotting colored markers at longitudes and latitudes based on count
if data[1] != "NULL" and data[0] != "NULL":
if count == 1:
folium.Marker(location=[data[1], data[0]], icon=folium.Icon(color='green')).add_to(map_osm)
elif count >= 2 and count <= 10:
folium.Marker(location=[data[1], data[0]], icon=folium.Icon(color='blue')).add_to(map_osm)
elif count >= 11 and count <= 35:
folium.Marker(location=[data[1], data[0]], icon=folium.Icon(color='purple')).add_to(map_osm)
else:
folium.Marker(location=[data[1], data[0]], icon=folium.Icon(color='red')).add_to(map_osm)
map_osm
Based on the map above, it looks like a majority of the schools where players entered the NBA came from the east coast. Consequently, a lot of the east coast schools also seem to have purple markers, indicating that there are some schools in that coast that sent a lot of players to the NBA.
Now, let's take a look at the top 5 most attended schools and the bottom 5 most attend schools. We can do this by taking previously made dictionary and splicing it where necessary.
# splitting the list of tuples to bottom 5
print(colleges_count[:5])
[('Bucknell University', 1), ('Louisiana Tech University', 1), ('Shaw University', 1), ('University of Nebraska', 1), ('Southeastern Illinois College', 1)]
# splitting the list of tuples to bottom 5
print(colleges_count[len(colleges_count) - 5 : len(colleges_count)])
[('University of Arizona', 26), ('University of California, Los Angeles', 26), ('Duke University', 29), ('University of Kentucky', 35), ('No College', 170)]
Now, let's take a look at how each of these 10 schools compared in terms of AVERAGE salary
# graphing top 5 means
colleges = []
salary = []
for college, count in colleges_count[len(colleges_count) - 5 : len(colleges_count)]:
colleges.append(college)
salary.append(grouped.get_group(college)['Salary'].mean())
# plotting
sns.barplot(x=colleges, y=salary)
plt.title('Mean Salary by Top 5 Colleges')
# scaling and rotating x-labels
sns.set(font_scale=1.5)
plt.figure.figzise = (30,30)
plt.xticks(rotation=90)
(array([0, 1, 2, 3, 4]), <a list of 5 Text major ticklabel objects>)
# plotting bottom 5 means
colleges = []
salary = []
for college, count in colleges_count[:5]:
colleges.append(college)
salary.append(grouped.get_group(college)['Salary'].mean())
# plotting
sns.barplot(x=colleges, y=salary)
plt.title('Mean Salary by Bottom 5 Colleges')
# scaling and rotating x-labels
sns.set(font_scale=0.5)
plt.figure.figzise = (30,30)
plt.xticks(rotation=90)
(array([0, 1, 2, 3, 4]), <a list of 5 Text major ticklabel objects>)
Based on the barplots from above, we can see that Louisiana Tech, being one of the lower attended schools, has a very high average salary in the NBA compared to the other colleges on that plot. Let's take a look at who that player is and why that might be the case.
grouped.get_group("Louisiana Tech University")
| Player | Year | Tm | Salary | Pos | Age | G | GS | PER | TS% | ... | DRB/G | AST/G | STL/G | BLK/G | TOV/G | PF/G | PTS/G | College | Draft Year | Experience | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | Paul Millsap | 2015 | ATL | 9500000 | PF | 29 | 73 | 73 | 20.0 | 0.565 | ... | 5.90411 | 3.054795 | 1.780822 | 0.945205 | 2.273973 | 2.753425 | 16.684932 | Louisiana Tech University | 2006 | 9 |
1 rows × 43 columns
When looking at the average salary of Paul Millsap in the NBA, one could guess that the players that perform statistically higher in terms of points per game and field goal percentage are the ones that get paid much more.
On the other hand, one could make case that the more experience a player has, the more likely they are to get paid higher salaries. Let's take a closer look at the data to see if either of these assumptions are accurate.
Let's dive a little deeper into the first case and explore why Paul Millsap got paid so much in 2015. In order to do this, why don't we compare Paul Millsap's statistics to that of his other teammates on the Atlanta Hawks during 2015.
# grabbing players that were on the altanta hawks in the year 2015
grouped_NBA = salaries.groupby('Tm')
atl_2015 = grouped_NBA.get_group("ATL").loc[grouped_NBA.get_group("ATL")["Year"] == 2015]
atl_2015
| Player | Year | Tm | Salary | Pos | Age | G | GS | PER | TS% | ... | DRB/G | AST/G | STL/G | BLK/G | TOV/G | PF/G | PTS/G | College | Draft Year | Experience | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | Paul Millsap | 2015 | ATL | 9500000 | PF | 29 | 73 | 73 | 20.0 | 0.565 | ... | 5.904110 | 3.054795 | 1.780822 | 0.945205 | 2.273973 | 2.753425 | 16.684932 | Louisiana Tech University | 2006 | 9 |
| 13 | DeMarre Carroll | 2015 | ATL | 2442455 | SF | 28 | 70 | 69 | 15.9 | 0.603 | ... | 3.900000 | 1.685714 | 1.342857 | 0.242857 | 1.071429 | 2.200000 | 12.614286 | University of Missouri | 2009 | 6 |
| 14 | Shelvin Mack | 2015 | ATL | 2433333 | PG | 24 | 55 | 0 | 13.2 | 0.489 | ... | 1.290909 | 2.818182 | 0.545455 | 0.036364 | 0.890909 | 0.581818 | 5.436364 | Butler University | 2011 | 4 |
| 15 | Elton Brand | 2015 | ATL | 2000000 | C | 35 | 36 | 4 | 9.4 | 0.457 | ... | 2.000000 | 0.611111 | 0.472222 | 0.694444 | 0.500000 | 1.500000 | 2.666667 | Duke University | 1999 | 16 |
| 16 | John Jenkins | 2015 | ATL | 1312920 | SG | 23 | 24 | 3 | 15.9 | 0.629 | ... | 1.541667 | 0.541667 | 0.416667 | 0.000000 | 0.333333 | 0.625000 | 5.625000 | Vanderbilt University | 2012 | 3 |
| 17 | Austin Daye | 2015 | ATL | 125104 | SF | 26 | 8 | 0 | 12.1 | 0.484 | ... | 1.250000 | 1.000000 | 0.500000 | 0.250000 | 0.625000 | 1.125000 | 3.250000 | Gonzaga University | 2009 | 6 |
6 rows × 43 columns
And to better visualize his performance, let's plot how each player compares to Paul Millsap's statistcs
sns.barplot(data=atl_2015, x='Player', y='PTS/G', hue='College')
plt.title('Points Per Game For Each Player On the 2015 Hawks')
# scaling and rotating x-labels
sns.set(font_scale=1.35)
plt.figure.figzise = (200,200)
plt.xticks(rotation=90)
plt.legend(prop={'size': 10})
<matplotlib.legend.Legend at 0x7f7ddf443970>
As seen from the above barplot, Paul Millsap clearly outscores the other players during that time in terms of average points per game. Let's now look at his is field goal percentage.
sns.barplot(data=atl_2015, x='Player', y='FG%', hue='College')
plt.title('FG % For Each Player On the 2015 Hawks')
# scaling and rotating x-labels
sns.set(font_scale=1.35)
plt.figure.figzise = (200,200)
plt.xticks(rotation=90)
plt.legend(prop={'size': 10})
<matplotlib.legend.Legend at 0x7f7ddea2beb0>
As seen from the above barplot, despite the fact that Paul Millsap does not have the highest field goal percentage on the team at the time, he was a very close third. Based on these statistics, along with the others seen from the dataframe, it is clear that Paul Millsap was a crucial role player for the Hawks that deserved the salary he received.
Now, why don't we go back a little bit and take a look at the second case, where we want to see how Paul Millsap's experience compares in relation to his salary.
sns.barplot(data=atl_2015, x='Player', y='Experience', hue='College')
plt.title('Year of Experience For Each Player On the 2015 Hawks')
# scaling and rotating x-labels
sns.set(font_scale=1.35)
plt.figure.figzise = (200,200)
plt.xticks(rotation=90)
plt.legend(prop={'size': 10})
<matplotlib.legend.Legend at 0x7f7ddee55c40>
It seems as though Paul Millsap was second in terms of experience for this 2015 Atlanta Hawks team. However, this is by a high margin from the most experienced player, Elton Brand.
Looking back on our motivation, along with some of the assumptions that we just made, we originally wanted to find out if the most-attended NCAA schools allow for their NBA-bound players to get paid more compared to the least-attended NCAA schools. Let's now test these claims.
Relationship between Salary and School
Let's revisit our initial exploration of the relationship between a NCAA college that a player went and the salary that they currently make in the NBA.
For our hypothesis test, the null hypothesis will be that there is no difference between the mean salaries of players who attended the 5 most popular colleges and that of the players who attended one of the 5 least popular colleges. The alternative hypothesis of our test is that there is a difference in the mean salaries between our two populations.
There are many types of tests that we could use here; however, because one of our populations is not greater than 30 (population of players that attended 5 least popular colleges), using a 2-sample t-test will fit best.
A 2-sample t-test is a statistical test that is used to compare the means of two samples. It is a type of hypothesis test that is commonly used to determine if there is a significant difference between the means of two groups.
The t-test works by comparing the means of the two samples to a hypothesized mean difference (usually 0). If the observed mean difference is statistically significant, the t-test will reject the null hypothesis (which states that there is no difference between the means of the two samples).
We will be using the ttest_ind function from the scipy.stats module. This function takes in two arrays of sample data and returns the t-statistic and p-value for the test. We will require a p-value of 0.05 or less to reject the null hypothesis.
The first thing we need to check is if the variances are equal.
# grabbing the df's for all the top 5 and bottom 5 most attended schools
top_5_dfs = []
bottom_5_dfs = []
# we have a groupby for the colleges, which returns df's, so let's take advantage of it
for college, count in colleges_count[len(colleges_count) - 5 : len(colleges_count)]:
top_5_dfs.append(grouped.get_group(college))
for college, count in colleges_count[:5]:
bottom_5_dfs.append(grouped.get_group(college))
# we can add all the df groups of colleges to an array and just concat them together
top_5_df = pd.concat(top_5_dfs)
bottom_5_df = pd.concat(bottom_5_dfs)
# using numpy to find variance
variance_top5 = np.var(top_5_df["Salary"])
print("Variance of salaries for players that attended top 5 most popular schools: ", variance_top5)
Variance of salaries for players that attended top 5 most popular schools: 20668779761012.984
variance_bott5 = np.var(bottom_5_df["Salary"])
print("Variance of salaries for players that attended least 5 most popular schools: ", variance_bott5)
Variance of salaries for players that attended least 5 most popular schools: 10857566732232.959
As we can see, the variances of the two populations are not the same, so we can move forward to use the Welch's t-test. The Welch's t-test is a variant of the two-sample t-test that does not assume equal variances.
# using statistics module to od a 2 sample t test
t_statistic, p_value = ttest_ind(top_5_df["Salary"], bottom_5_df["Salary"], equal_var=False)
print("Test statistic = ", t_statistic)
print("P-value = ", p_value)
Test statistic = 0.3941971323001333 P-value = 0.7125651976920424
As we can see from above, the p-value is 0.71256, which is much greater than our significance threshold of 0.05. This means that we fail to reject the Null Hypothesis as there is not sufficient evidence to conclude that the alternative holds.
This conclusion, in terms of our test, suggests that there is not enough sufficient evidence to conclude that there is a difference in the salaries for players that went to the 5 most attended colleges in comparison to the salaries of players that went to the 5 least attended colleges.
Based on the results of this hypothesis test, let's fit a linear regression where the college a player attended is the independent variable, and their annual salary is the dependent variable. The null hypothesis for each college is that it is NOT related to salary, and the alternative hypothesis is that it IS related to salary. We will require a p-value of 0.05 or less to reject the null hypothesis.
To fit a linear regression on two variables we can use the "Ordinary Least Squares" method to estimate the parameters of a linear regression model. To fit a linear regression model by using the OLS class from the statsmodels library.
# Convert the independent variable to a numeric data type using one-hot encoding
X = pd.get_dummies(salaries["College"])
# Extract the dependent variable from the dataframe
y = salaries["Salary"]
# Fit the linear regression model using OLS
lm = sm.OLS(y, X).fit()
lm.summary()
| Dep. Variable: | Salary | R-squared: | 0.155 |
|---|---|---|---|
| Model: | OLS | Adj. R-squared: | -0.033 |
| Method: | Least Squares | F-statistic: | 0.8246 |
| Date: | Sat, 17 Dec 2022 | Prob (F-statistic): | 0.941 |
| Time: | 01:05:31 | Log-Likelihood: | -15850. |
| No. Observations: | 954 | AIC: | 3.205e+04 |
| Df Residuals: | 780 | BIC: | 3.289e+04 |
| Df Model: | 173 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| Arizona State University | 6.798e+06 | 2.54e+06 | 2.678 | 0.008 | 1.82e+06 | 1.18e+07 |
| Auburn University | 5.685e+05 | 2.54e+06 | 0.224 | 0.823 | -4.41e+06 | 5.55e+06 |
| Augsburg College | 1.6e+06 | 4.4e+06 | 0.364 | 0.716 | -7.03e+06 | 1.02e+07 |
| Austin Peay State University | 2.305e+06 | 3.11e+06 | 0.741 | 0.459 | -3.8e+06 | 8.41e+06 |
| Ball State University | 2.115e+06 | 4.4e+06 | 0.481 | 0.631 | -6.52e+06 | 1.07e+07 |
| Barton County Community College | 1.326e+06 | 4.4e+06 | 0.302 | 0.763 | -7.3e+06 | 9.96e+06 |
| Baylor University | 1.353e+06 | 1.79e+06 | 0.754 | 0.451 | -2.17e+06 | 4.88e+06 |
| Boston College | 4.07e+06 | 1.97e+06 | 2.070 | 0.039 | 2.11e+05 | 7.93e+06 |
| Bowling Green State University | 1.6e+06 | 3.11e+06 | 0.515 | 0.607 | -4.5e+06 | 7.7e+06 |
| Bradley University | 2.233e+06 | 3.11e+06 | 0.718 | 0.473 | -3.87e+06 | 8.34e+06 |
| Brigham Young University | 8.869e+05 | 2.54e+06 | 0.349 | 0.727 | -4.1e+06 | 5.87e+06 |
| Bucknell University | 9.473e+05 | 4.4e+06 | 0.215 | 0.829 | -7.68e+06 | 9.58e+06 |
| Butler County Community College | 1e+06 | 4.4e+06 | 0.227 | 0.820 | -7.63e+06 | 9.63e+06 |
| Butler University | 8.921e+06 | 3.11e+06 | 2.870 | 0.004 | 2.82e+06 | 1.5e+07 |
| California State University, Bakersfield | 2.42e+05 | 4.4e+06 | 0.055 | 0.956 | -8.39e+06 | 8.87e+06 |
| California State University, Fresno | 3.13e+06 | 1.47e+06 | 2.136 | 0.033 | 2.53e+05 | 6.01e+06 |
| California State University, Fullerton | 7.446e+05 | 4.4e+06 | 0.169 | 0.866 | -7.89e+06 | 9.38e+06 |
| California State University, Long Beach | 5.435e+05 | 3.11e+06 | 0.175 | 0.861 | -5.56e+06 | 6.65e+06 |
| Central Connecticut State University | 3.195e+05 | 4.4e+06 | 0.073 | 0.942 | -8.31e+06 | 8.95e+06 |
| Central Michigan University | 8e+06 | 4.4e+06 | 1.820 | 0.069 | -6.31e+05 | 1.66e+07 |
| Central State University | 5.48e+05 | 4.4e+06 | 0.125 | 0.901 | -8.08e+06 | 9.18e+06 |
| Clemson University | 2.92e+06 | 2.2e+06 | 1.328 | 0.184 | -1.39e+06 | 7.24e+06 |
| Cleveland State University | 1.129e+06 | 4.4e+06 | 0.257 | 0.797 | -7.5e+06 | 9.76e+06 |
| Colgate University | 8.125e+06 | 4.4e+06 | 1.848 | 0.065 | -5.06e+05 | 1.68e+07 |
| College of Charleston | 1.42e+06 | 3.11e+06 | 0.457 | 0.648 | -4.68e+06 | 7.52e+06 |
| College of William & Mary | 1.186e+06 | 4.4e+06 | 0.270 | 0.787 | -7.44e+06 | 9.82e+06 |
| Colorado State University | 2.5e+06 | 4.4e+06 | 0.569 | 0.570 | -6.13e+06 | 1.11e+07 |
| Creighton University | 2.957e+06 | 2.54e+06 | 1.165 | 0.244 | -2.03e+06 | 7.94e+06 |
| Davidson College | 1.137e+07 | 4.4e+06 | 2.586 | 0.010 | 2.74e+06 | 2e+07 |
| DePaul University | 5.593e+06 | 2.2e+06 | 2.544 | 0.011 | 1.28e+06 | 9.91e+06 |
| Drexel University | 2.2e+05 | 4.4e+06 | 0.050 | 0.960 | -8.41e+06 | 8.85e+06 |
| Duke University | 4.317e+06 | 8.16e+05 | 5.288 | 0.000 | 2.71e+06 | 5.92e+06 |
| Eastern Michigan University | 5.408e+05 | 4.4e+06 | 0.123 | 0.902 | -8.09e+06 | 9.17e+06 |
| Eastern Washington University | 8.5e+06 | 4.4e+06 | 1.933 | 0.054 | -1.31e+05 | 1.71e+07 |
| Florida Agricultural and Mechanical University | 5.8e+06 | 4.4e+06 | 1.319 | 0.187 | -2.83e+06 | 1.44e+07 |
| Florida State University | 1.003e+06 | 1.47e+06 | 0.684 | 0.494 | -1.87e+06 | 3.88e+06 |
| Georgetown University | 7.316e+06 | 1.27e+06 | 5.764 | 0.000 | 4.82e+06 | 9.81e+06 |
| Georgia Institute of Technology | 6.133e+06 | 1.18e+06 | 5.219 | 0.000 | 3.83e+06 | 8.44e+06 |
| Georgia State University | 1.149e+06 | 4.4e+06 | 0.261 | 0.794 | -7.48e+06 | 9.78e+06 |
| Gonzaga University | 1.977e+06 | 1.79e+06 | 1.102 | 0.271 | -1.55e+06 | 5.5e+06 |
| Hofstra University | 3.116e+06 | 3.11e+06 | 1.002 | 0.317 | -2.99e+06 | 9.22e+06 |
| Indian Hills Community College | 3.328e+05 | 4.4e+06 | 0.076 | 0.940 | -8.3e+06 | 8.96e+06 |
| Indiana University | 2.324e+06 | 1.33e+06 | 1.753 | 0.080 | -2.78e+05 | 4.93e+06 |
| Indiana University-Purdue University Indianapolis | 8e+06 | 4.4e+06 | 1.820 | 0.069 | -6.31e+05 | 1.66e+07 |
| Iowa State University | 2.91e+06 | 1.66e+06 | 1.751 | 0.080 | -3.52e+05 | 6.17e+06 |
| Kansas State University | 3.065e+05 | 4.4e+06 | 0.070 | 0.944 | -8.32e+06 | 8.94e+06 |
| La Salle University | 2.4e+06 | 4.4e+06 | 0.546 | 0.585 | -6.23e+06 | 1.1e+07 |
| Lehigh University | 2.525e+06 | 4.4e+06 | 0.574 | 0.566 | -6.11e+06 | 1.12e+07 |
| Louisiana State University | 2.589e+06 | 1.39e+06 | 1.862 | 0.063 | -1.41e+05 | 5.32e+06 |
| Louisiana Tech University | 9.5e+06 | 4.4e+06 | 2.161 | 0.031 | 8.69e+05 | 1.81e+07 |
| Manhattan College | 3.853e+05 | 4.4e+06 | 0.088 | 0.930 | -8.25e+06 | 9.02e+06 |
| Marquette University | 5.2e+06 | 1.47e+06 | 3.548 | 0.000 | 2.32e+06 | 8.08e+06 |
| Marshall University | 7.725e+05 | 3.11e+06 | 0.248 | 0.804 | -5.33e+06 | 6.88e+06 |
| Miami University | 1.178e+07 | 4.4e+06 | 2.678 | 0.008 | 3.14e+06 | 2.04e+07 |
| Michigan State University | 4.684e+06 | 1.27e+06 | 3.691 | 0.000 | 2.19e+06 | 7.18e+06 |
| Mississippi State University | 6.594e+05 | 2.2e+06 | 0.300 | 0.764 | -3.66e+06 | 4.97e+06 |
| Morehead State University | 1.124e+07 | 4.4e+06 | 2.556 | 0.011 | 2.61e+06 | 1.99e+07 |
| Murray State University | 1.004e+06 | 2.54e+06 | 0.396 | 0.693 | -3.98e+06 | 5.99e+06 |
| No College | 3.78e+06 | 3.37e+05 | 11.208 | 0.000 | 3.12e+06 | 4.44e+06 |
| Norfolk State University | 3.75e+06 | 4.4e+06 | 0.853 | 0.394 | -4.88e+06 | 1.24e+07 |
| North Carolina State University | 1.321e+06 | 1.79e+06 | 0.736 | 0.462 | -2.2e+06 | 4.84e+06 |
| Northeast Mississippi Community College | 5.998e+05 | 4.4e+06 | 0.136 | 0.892 | -8.03e+06 | 9.23e+06 |
| Northwestern University | 4.235e+05 | 4.4e+06 | 0.096 | 0.923 | -8.21e+06 | 9.05e+06 |
| Ohio State University | 4.591e+06 | 1.39e+06 | 3.302 | 0.001 | 1.86e+06 | 7.32e+06 |
| Ohio University | 3.669e+05 | 4.4e+06 | 0.083 | 0.934 | -8.26e+06 | 9e+06 |
| Okaloosa-Walton Community College | 1.643e+06 | 4.4e+06 | 0.374 | 0.709 | -6.99e+06 | 1.03e+07 |
| Oklahoma State University | 1.895e+06 | 1.66e+06 | 1.141 | 0.254 | -1.37e+06 | 5.16e+06 |
| Old Dominion University | 1.18e+06 | 4.4e+06 | 0.268 | 0.789 | -7.45e+06 | 9.81e+06 |
| Oregon State University | 9.788e+05 | 3.11e+06 | 0.315 | 0.753 | -5.12e+06 | 7.08e+06 |
| Pennsylvania State University | 5.901e+06 | 4.4e+06 | 1.342 | 0.180 | -2.73e+06 | 1.45e+07 |
| Pepperdine University | 6.972e+05 | 3.11e+06 | 0.224 | 0.823 | -5.41e+06 | 6.8e+06 |
| Providence College | 2.047e+06 | 1.97e+06 | 1.041 | 0.298 | -1.81e+06 | 5.91e+06 |
| Purdue University | 1.55e+06 | 1.97e+06 | 0.788 | 0.431 | -2.31e+06 | 5.41e+06 |
| Rice University | 1.081e+06 | 4.4e+06 | 0.246 | 0.806 | -7.55e+06 | 9.71e+06 |
| Rider University | 6.431e+06 | 4.4e+06 | 1.463 | 0.144 | -2.2e+06 | 1.51e+07 |
| Rutgers University | 8.487e+05 | 3.11e+06 | 0.273 | 0.785 | -5.25e+06 | 6.95e+06 |
| Saint Joseph's University | 2.785e+06 | 3.11e+06 | 0.896 | 0.371 | -3.32e+06 | 8.89e+06 |
| Saint Louis University | 1.2e+07 | 4.4e+06 | 2.729 | 0.006 | 3.37e+06 | 2.06e+07 |
| San Diego State University | 5.524e+06 | 2.54e+06 | 2.176 | 0.030 | 5.41e+05 | 1.05e+07 |
| San Jose State University | 5.625e+06 | 4.4e+06 | 1.279 | 0.201 | -3.01e+06 | 1.43e+07 |
| Santa Clara University | 5.75e+06 | 4.4e+06 | 1.308 | 0.191 | -2.88e+06 | 1.44e+07 |
| Seton Hall University | 2.931e+06 | 3.11e+06 | 0.943 | 0.346 | -3.17e+06 | 9.03e+06 |
| Shaw University | 1.5e+06 | 4.4e+06 | 0.341 | 0.733 | -7.13e+06 | 1.01e+07 |
| South Dakota State University | 8.165e+05 | 4.4e+06 | 0.186 | 0.853 | -7.81e+06 | 9.45e+06 |
| Southeastern Illinois College | 6.417e+05 | 4.4e+06 | 0.146 | 0.884 | -7.99e+06 | 9.27e+06 |
| Southern Methodist University | 9.695e+05 | 4.4e+06 | 0.221 | 0.826 | -7.66e+06 | 9.6e+06 |
| St. Bonaventure University | 2.381e+06 | 4.4e+06 | 0.541 | 0.588 | -6.25e+06 | 1.1e+07 |
| St. John's University | 9.283e+05 | 2.2e+06 | 0.422 | 0.673 | -3.39e+06 | 5.24e+06 |
| Stanford University | 2.971e+06 | 1.18e+06 | 2.529 | 0.012 | 6.64e+05 | 5.28e+06 |
| Syracuse University | 2.285e+06 | 1.01e+06 | 2.265 | 0.024 | 3.05e+05 | 4.26e+06 |
| Temple University | 3.373e+06 | 2.54e+06 | 1.329 | 0.184 | -1.61e+06 | 8.36e+06 |
| Tennessee Technological University | 4.736e+05 | 4.4e+06 | 0.108 | 0.914 | -8.16e+06 | 9.1e+06 |
| Texas A&M University | 5.904e+06 | 2.2e+06 | 2.686 | 0.007 | 1.59e+06 | 1.02e+07 |
| Texas Christian University | 7.45e+05 | 4.4e+06 | 0.169 | 0.865 | -7.89e+06 | 9.38e+06 |
| Texas State University | 3e+06 | 4.4e+06 | 0.682 | 0.495 | -5.63e+06 | 1.16e+07 |
| Texas Tech University | 1.45e+06 | 2.54e+06 | 0.571 | 0.568 | -3.53e+06 | 6.43e+06 |
| Tulane University | 3e+05 | 4.4e+06 | 0.068 | 0.946 | -8.33e+06 | 8.93e+06 |
| University of Alabama | 3.113e+06 | 1.97e+06 | 1.583 | 0.114 | -7.47e+05 | 6.97e+06 |
| University of Alabama at Birmingham | 4.421e+05 | 4.4e+06 | 0.101 | 0.920 | -8.19e+06 | 9.07e+06 |
| University of Arizona | 3.232e+06 | 8.62e+05 | 3.749 | 0.000 | 1.54e+06 | 4.92e+06 |
| University of Arkansas | 6.216e+06 | 1.97e+06 | 3.162 | 0.002 | 2.36e+06 | 1.01e+07 |
| University of Arkansas at Little Rock | 1.829e+05 | 4.4e+06 | 0.042 | 0.967 | -8.45e+06 | 8.81e+06 |
| University of California | 2.158e+06 | 1.47e+06 | 1.472 | 0.141 | -7.19e+05 | 5.03e+06 |
| University of California, Los Angeles | 4.272e+06 | 8.62e+05 | 4.955 | 0.000 | 2.58e+06 | 5.97e+06 |
| University of California, Santa Barbara | 7.889e+05 | 4.4e+06 | 0.179 | 0.858 | -7.84e+06 | 9.42e+06 |
| University of Central Florida | 7.25e+05 | 4.4e+06 | 0.165 | 0.869 | -7.91e+06 | 9.36e+06 |
| University of Cincinnati | 2.61e+06 | 1.55e+06 | 1.679 | 0.094 | -4.41e+05 | 5.66e+06 |
| University of Colorado | 3.078e+06 | 1.97e+06 | 1.565 | 0.118 | -7.82e+05 | 6.94e+06 |
| University of Connecticut | 4.958e+06 | 1.04e+06 | 4.784 | 0.000 | 2.92e+06 | 6.99e+06 |
| University of Detroit Mercy | 6.673e+05 | 3.11e+06 | 0.215 | 0.830 | -5.44e+06 | 6.77e+06 |
| University of Florida | 5.122e+06 | 1.14e+06 | 4.512 | 0.000 | 2.89e+06 | 7.35e+06 |
| University of Georgia | 1.6e+06 | 1.79e+06 | 0.892 | 0.373 | -1.92e+06 | 5.12e+06 |
| University of Houston | 5.861e+05 | 4.4e+06 | 0.133 | 0.894 | -8.04e+06 | 9.22e+06 |
| University of Idaho | 4.839e+05 | 4.4e+06 | 0.110 | 0.912 | -8.15e+06 | 9.11e+06 |
| University of Illinois at Urbana-Champaign | 4.223e+06 | 1.55e+06 | 2.717 | 0.007 | 1.17e+06 | 7.27e+06 |
| University of Iowa | 3.58e+06 | 3.11e+06 | 1.151 | 0.250 | -2.52e+06 | 9.68e+06 |
| University of Kansas | 3.089e+06 | 8.62e+05 | 3.583 | 0.000 | 1.4e+06 | 4.78e+06 |
| University of Kentucky | 4.162e+06 | 7.43e+05 | 5.600 | 0.000 | 2.7e+06 | 5.62e+06 |
| University of Louisiana at Lafayette | 1.452e+06 | 3.11e+06 | 0.467 | 0.641 | -4.65e+06 | 7.56e+06 |
| University of Louisiana at Monroe | 3.17e+05 | 4.4e+06 | 0.072 | 0.943 | -8.31e+06 | 8.95e+06 |
| University of Louisville | 1.022e+06 | 1.39e+06 | 0.735 | 0.463 | -1.71e+06 | 3.75e+06 |
| University of Maryland | 1.249e+06 | 1.22e+06 | 1.025 | 0.306 | -1.14e+06 | 3.64e+06 |
| University of Massachusetts Amherst | 3.797e+06 | 2.54e+06 | 1.496 | 0.135 | -1.19e+06 | 8.78e+06 |
| University of Memphis | 4.084e+06 | 1.27e+06 | 3.218 | 0.001 | 1.59e+06 | 6.58e+06 |
| University of Miami | 1.778e+06 | 1.97e+06 | 0.904 | 0.366 | -2.08e+06 | 5.64e+06 |
| University of Michigan | 3.312e+06 | 1.39e+06 | 2.382 | 0.017 | 5.83e+05 | 6.04e+06 |
| University of Minnesota | 1.482e+06 | 1.79e+06 | 0.826 | 0.409 | -2.04e+06 | 5.01e+06 |
| University of Mississippi | 6.934e+05 | 3.11e+06 | 0.223 | 0.824 | -5.41e+06 | 6.8e+06 |
| University of Missouri | 1.805e+06 | 1.79e+06 | 1.006 | 0.315 | -1.72e+06 | 5.33e+06 |
| University of Nebraska | 3.5e+06 | 4.4e+06 | 0.796 | 0.426 | -5.13e+06 | 1.21e+07 |
| University of Nevada, Las Vegas | 2.322e+06 | 1.97e+06 | 1.181 | 0.238 | -1.54e+06 | 6.18e+06 |
| University of Nevada, Reno | 1.683e+06 | 1.79e+06 | 0.938 | 0.349 | -1.84e+06 | 5.21e+06 |
| University of New Mexico | 2.61e+06 | 1.66e+06 | 1.570 | 0.117 | -6.53e+05 | 5.87e+06 |
| University of North Carolina | 3.875e+06 | 9.17e+05 | 4.227 | 0.000 | 2.08e+06 | 5.67e+06 |
| University of North Carolina at Charlotte | 9.549e+05 | 3.11e+06 | 0.307 | 0.759 | -5.15e+06 | 7.06e+06 |
| University of North Dakota | 3.669e+05 | 4.4e+06 | 0.083 | 0.934 | -8.26e+06 | 9e+06 |
| University of Notre Dame | 1.353e+06 | 1.79e+06 | 0.754 | 0.451 | -2.17e+06 | 4.88e+06 |
| University of Oklahoma | 5.779e+06 | 2.2e+06 | 2.629 | 0.009 | 1.46e+06 | 1.01e+07 |
| University of Oregon | 2.268e+06 | 1.97e+06 | 1.154 | 0.249 | -1.59e+06 | 6.13e+06 |
| University of Pittsburgh | 1.486e+06 | 1.55e+06 | 0.956 | 0.340 | -1.57e+06 | 4.54e+06 |
| University of Rhode Island | 7.147e+06 | 3.11e+06 | 2.299 | 0.022 | 1.04e+06 | 1.32e+07 |
| University of Richmond | 9.942e+04 | 4.4e+06 | 0.023 | 0.982 | -8.53e+06 | 8.73e+06 |
| University of South Carolina | 1.249e+06 | 3.11e+06 | 0.402 | 0.688 | -4.85e+06 | 7.35e+06 |
| University of South Florida | 1.037e+06 | 3.11e+06 | 0.334 | 0.739 | -5.07e+06 | 7.14e+06 |
| University of Southern California | 4.834e+06 | 1.47e+06 | 3.298 | 0.001 | 1.96e+06 | 7.71e+06 |
| University of Tennessee | 2.653e+06 | 1.66e+06 | 1.596 | 0.111 | -6.09e+05 | 5.91e+06 |
| University of Tennessee at Chattanooga | 1.037e+06 | 4.4e+06 | 0.236 | 0.814 | -7.59e+06 | 9.67e+06 |
| University of Tennessee at Martin | 1.938e+05 | 4.4e+06 | 0.044 | 0.965 | -8.44e+06 | 8.82e+06 |
| University of Texas at Austin | 5.659e+06 | 1.1e+06 | 5.148 | 0.000 | 3.5e+06 | 7.82e+06 |
| University of Texas at El Paso | 4.736e+05 | 4.4e+06 | 0.108 | 0.914 | -8.16e+06 | 9.1e+06 |
| University of Toledo | 2.875e+05 | 4.4e+06 | 0.065 | 0.948 | -8.34e+06 | 8.92e+06 |
| University of Tulsa | 6.776e+05 | 3.11e+06 | 0.218 | 0.828 | -5.43e+06 | 6.78e+06 |
| University of Utah | 3.582e+06 | 1.97e+06 | 1.822 | 0.069 | -2.78e+05 | 7.44e+06 |
| University of Virginia | 1.292e+06 | 1.97e+06 | 0.657 | 0.511 | -2.57e+06 | 5.15e+06 |
| University of Washington | 3.257e+06 | 1.33e+06 | 2.457 | 0.014 | 6.55e+05 | 5.86e+06 |
| University of West Florida | 4.2e+06 | 4.4e+06 | 0.955 | 0.340 | -4.43e+06 | 1.28e+07 |
| University of Wisconsin | 2.617e+06 | 1.79e+06 | 1.458 | 0.145 | -9.07e+05 | 6.14e+06 |
| University of Wisconsin-Green Bay | 2.42e+05 | 4.4e+06 | 0.055 | 0.956 | -8.39e+06 | 8.87e+06 |
| University of the Pacific | 1.103e+06 | 4.4e+06 | 0.251 | 0.802 | -7.53e+06 | 9.73e+06 |
| Valparaiso University | 1.183e+06 | 4.4e+06 | 0.269 | 0.788 | -7.45e+06 | 9.81e+06 |
| Vanderbilt University | 8.851e+05 | 2.2e+06 | 0.403 | 0.687 | -3.43e+06 | 5.2e+06 |
| Villanova University | 2.686e+06 | 1.39e+06 | 1.932 | 0.054 | -4.36e+04 | 5.41e+06 |
| Virginia Commonwealth University | 6.258e+06 | 3.11e+06 | 2.013 | 0.044 | 1.56e+05 | 1.24e+07 |
| Virginia Polytechnic Institute and State University | 1e+05 | 4.4e+06 | 0.023 | 0.982 | -8.53e+06 | 8.73e+06 |
| Wake Forest University | 7.027e+06 | 1.66e+06 | 4.229 | 0.000 | 3.76e+06 | 1.03e+07 |
| Walsh University | 3.988e+05 | 4.4e+06 | 0.091 | 0.928 | -8.23e+06 | 9.03e+06 |
| Washington State University | 5.599e+06 | 2.54e+06 | 2.206 | 0.028 | 6.16e+05 | 1.06e+07 |
| Weber State University | 4.236e+06 | 4.4e+06 | 0.964 | 0.336 | -4.39e+06 | 1.29e+07 |
| West Virginia University | 1.819e+06 | 3.11e+06 | 0.585 | 0.559 | -4.28e+06 | 7.92e+06 |
| Western Carolina University | 1.152e+07 | 4.4e+06 | 2.620 | 0.009 | 2.89e+06 | 2.02e+07 |
| Western Kentucky University | 2.142e+06 | 2.54e+06 | 0.844 | 0.399 | -2.84e+06 | 7.12e+06 |
| Wichita State University | 8.451e+05 | 4.4e+06 | 0.192 | 0.848 | -7.79e+06 | 9.48e+06 |
| Wright State University | 4.762e+06 | 4.4e+06 | 1.083 | 0.279 | -3.87e+06 | 1.34e+07 |
| Xavier University | 3.121e+06 | 1.79e+06 | 1.739 | 0.082 | -4.02e+05 | 6.64e+06 |
| Omnibus: | 395.719 | Durbin-Watson: | 1.476 |
|---|---|---|---|
| Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 1533.982 |
| Skew: | 1.995 | Prob(JB): | 0.00 |
| Kurtosis: | 7.761 | Cond. No. | 13.0 |
lm.summary2().tables[1]['P>|t|']
Arizona State University 0.007559
Auburn University 0.822833
Augsburg College 0.716022
Austin Peay State University 0.458664
Ball State University 0.630619
...
Western Carolina University 0.008961
Western Kentucky University 0.399051
Wichita State University 0.847632
Wright State University 0.279052
Xavier University 0.082459
Name: P>|t|, Length: 174, dtype: float64
The p-values for languages vary greatly, with some far below 0.05 and others far over this level. As a result, we can only fail to reject the null hypothesis that no linear association exists between a language and salary for a few salaries. This means that our initial claim was actually incorrect!
Furthermore, the R-squared value of 0.155 indicates that there is not much of a linear correlation between languages and salary. This suggests that, in general, using NCAA college as a predictor of NBA income may not be the best option.
Relationship between Points Per Game and Salary
There is a potential relationship between salary and points per game in the NBA. Generally, players who score more points per game are considered to be more valuable to their team and may command higher salaries as a result. This is because scoring points is an important aspect of the game of basketball and players who are able to consistently put points on the board are often seen as being more valuable to their team's success. However, it is important to note that salary in the NBA is also influenced by a variety of other factors, such as a player's defensive ability, their overall skill level, and their marketability. As such, the relationship between salary and points per game may not always be straightforward.
Let's fit a linear regression with years coding professionally as the independent variable and yearly salary as the dependent variable to see the correlation strength between the two variables. We can do this with an ANOVA test and OLS regression.
Our null hypothesis will be that there is no difference between the true salaries for all the populations, while our alternative is that there is at least one population that differs.
# Convert the independent variable to a numeric data type using one-hot encoding
X = pd.get_dummies(salaries["PTS/G"])
# Extract the dependent variable from the dataframe
y = salaries["Salary"]
# Fit the linear regression model using OLS
lm = sm.OLS(y, X).fit()
lm.summary2().tables[0].iloc[0, 2:]
2 Adj. R-squared: 3 0.343 Name: 0, dtype: object
lm.summary2().tables[1]['P>|t|']
0.000000 3.872958e-01
0.133333 6.377304e-01
0.200000 6.512388e-01
0.222222 8.734951e-01
0.285714 6.362034e-01
...
26.892308 1.140910e-13
28.159420 3.841185e-15
28.180556 1.575067e-21
28.975610 4.146788e-15
30.063291 2.969530e-09
Name: P>|t|, Length: 803, dtype: float64
As seen from the results of the fitting above, the R-squared value of 0.972 indicates that there is a strong linear correlation between points per game and salary. This suggests that, in general, using this statistic as a predictor of NBA income is a good option.
Relationship between Field Goal Percentage and Salary
Now why don't we visit another one of the previous claims mentioned and test the results of our ANOVA test and OLS Regression.
Our null hypothesis will be that there is no difference between the true salaries for all the populations, while our alternative is that there is at least one population that differs.
# Convert the independent variable to a numeric data type using one-hot encoding
X = pd.get_dummies(salaries["FG%"])
# Extract the dependent variable from the dataframe
y = salaries["Salary"]
# Fit the linear regression model using OLS
lm = sm.OLS(y, X).fit()
lm.summary()
| Dep. Variable: | Salary | R-squared (uncentered): | 0.661 |
|---|---|---|---|
| Model: | OLS | Adj. R-squared (uncentered): | 0.515 |
| Method: | Least Squares | F-statistic: | 4.513 |
| Date: | Sat, 17 Dec 2022 | Prob (F-statistic): | 7.06e-58 |
| Time: | 01:05:35 | Log-Likelihood: | -15641. |
| No. Observations: | 954 | AIC: | 3.186e+04 |
| Df Residuals: | 666 | BIC: | 3.326e+04 |
| Df Model: | 288 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| 0.0 | 6.898e+05 | 1.1e+06 | 0.625 | 0.532 | -1.48e+06 | 2.86e+06 |
| 0.083 | 1.232e+06 | 2.7e+06 | 0.456 | 0.649 | -4.07e+06 | 6.54e+06 |
| 0.1 | 2.025e+05 | 3.82e+06 | 0.053 | 0.958 | -7.3e+06 | 7.7e+06 |
| 0.118 | 1.271e+06 | 3.82e+06 | 0.333 | 0.739 | -6.23e+06 | 8.77e+06 |
| 0.125 | 5.69e+05 | 2.7e+06 | 0.211 | 0.833 | -4.74e+06 | 5.87e+06 |
| 0.143 | 4e+05 | 3.82e+06 | 0.105 | 0.917 | -7.1e+06 | 7.9e+06 |
| 0.15 | 9.575e+05 | 3.82e+06 | 0.251 | 0.802 | -6.54e+06 | 8.46e+06 |
| 0.154 | 2.459e+05 | 3.82e+06 | 0.064 | 0.949 | -7.26e+06 | 7.75e+06 |
| 0.158 | 1.098e+06 | 3.82e+06 | 0.287 | 0.774 | -6.4e+06 | 8.6e+06 |
| 0.167 | 1.378e+06 | 1.91e+06 | 0.721 | 0.471 | -2.37e+06 | 5.13e+06 |
| 0.182 | 1.53e+06 | 2.7e+06 | 0.566 | 0.571 | -3.77e+06 | 6.83e+06 |
| 0.19 | 1.19e+05 | 3.82e+06 | 0.031 | 0.975 | -7.38e+06 | 7.62e+06 |
| 0.2 | 5.213e+05 | 1.44e+06 | 0.361 | 0.718 | -2.31e+06 | 3.36e+06 |
| 0.205 | 1.057e+06 | 3.82e+06 | 0.277 | 0.782 | -6.44e+06 | 8.56e+06 |
| 0.222 | 1.285e+06 | 3.82e+06 | 0.336 | 0.737 | -6.22e+06 | 8.79e+06 |
| 0.231 | 1.082e+06 | 3.82e+06 | 0.283 | 0.777 | -6.42e+06 | 8.58e+06 |
| 0.235 | 1.433e+05 | 3.82e+06 | 0.038 | 0.970 | -7.36e+06 | 7.64e+06 |
| 0.238 | 1.073e+06 | 3.82e+06 | 0.281 | 0.779 | -6.43e+06 | 8.57e+06 |
| 0.242 | 1.075e+06 | 3.82e+06 | 0.281 | 0.779 | -6.43e+06 | 8.58e+06 |
| 0.25 | 6.133e+05 | 1.27e+06 | 0.482 | 0.630 | -1.89e+06 | 3.11e+06 |
| 0.255 | 1.352e+06 | 3.82e+06 | 0.354 | 0.723 | -6.15e+06 | 8.85e+06 |
| 0.265 | 5.251e+05 | 3.82e+06 | 0.137 | 0.891 | -6.98e+06 | 8.03e+06 |
| 0.27 | 1.317e+06 | 3.82e+06 | 0.345 | 0.730 | -6.18e+06 | 8.82e+06 |
| 0.273 | 9.235e+05 | 2.7e+06 | 0.342 | 0.733 | -4.38e+06 | 6.23e+06 |
| 0.274 | 1.824e+06 | 3.82e+06 | 0.478 | 0.633 | -5.68e+06 | 9.33e+06 |
| 0.278 | 4.159e+05 | 2.7e+06 | 0.154 | 0.878 | -4.89e+06 | 5.72e+06 |
| 0.286 | 3.945e+06 | 1.56e+06 | 2.529 | 0.012 | 8.83e+05 | 7.01e+06 |
| 0.289 | 1.376e+06 | 3.82e+06 | 0.360 | 0.719 | -6.13e+06 | 8.88e+06 |
| 0.291 | 8.324e+05 | 3.82e+06 | 0.218 | 0.828 | -6.67e+06 | 8.33e+06 |
| 0.293 | 3.415e+06 | 3.82e+06 | 0.894 | 0.372 | -4.09e+06 | 1.09e+07 |
| 0.294 | 2.911e+06 | 3.82e+06 | 0.762 | 0.446 | -4.59e+06 | 1.04e+07 |
| 0.296 | 1.46e+05 | 3.82e+06 | 0.038 | 0.970 | -7.36e+06 | 7.65e+06 |
| 0.3 | 8.518e+05 | 1.91e+06 | 0.446 | 0.656 | -2.9e+06 | 4.6e+06 |
| 0.302 | 3.17e+05 | 3.82e+06 | 0.083 | 0.934 | -7.18e+06 | 7.82e+06 |
| 0.304 | 1.916e+06 | 2.7e+06 | 0.709 | 0.478 | -3.39e+06 | 7.22e+06 |
| 0.305 | 1.733e+06 | 3.82e+06 | 0.454 | 0.650 | -5.77e+06 | 9.23e+06 |
| 0.306 | 4.211e+06 | 2.21e+06 | 1.909 | 0.057 | -1.2e+05 | 8.54e+06 |
| 0.307 | 7.115e+05 | 3.82e+06 | 0.186 | 0.852 | -6.79e+06 | 8.21e+06 |
| 0.308 | 8.38e+05 | 2.7e+06 | 0.310 | 0.756 | -4.47e+06 | 6.14e+06 |
| 0.309 | 9.695e+05 | 3.82e+06 | 0.254 | 0.800 | -6.53e+06 | 8.47e+06 |
| 0.31 | 6.125e+05 | 2.7e+06 | 0.227 | 0.821 | -4.69e+06 | 5.92e+06 |
| 0.313 | 6.387e+05 | 3.82e+06 | 0.167 | 0.867 | -6.86e+06 | 8.14e+06 |
| 0.314 | 5.1e+06 | 3.82e+06 | 1.335 | 0.182 | -2.4e+06 | 1.26e+07 |
| 0.315 | 4.272e+05 | 3.82e+06 | 0.112 | 0.911 | -7.07e+06 | 7.93e+06 |
| 0.318 | 8.165e+05 | 3.82e+06 | 0.214 | 0.831 | -6.68e+06 | 8.32e+06 |
| 0.319 | 1.17e+06 | 2.7e+06 | 0.433 | 0.665 | -4.13e+06 | 6.47e+06 |
| 0.32 | 5.893e+05 | 2.7e+06 | 0.218 | 0.827 | -4.71e+06 | 5.89e+06 |
| 0.322 | 3.17e+05 | 3.82e+06 | 0.083 | 0.934 | -7.18e+06 | 7.82e+06 |
| 0.323 | 1.29e+06 | 2.21e+06 | 0.585 | 0.559 | -3.04e+06 | 5.62e+06 |
| 0.325 | 1.421e+06 | 2.21e+06 | 0.644 | 0.519 | -2.91e+06 | 5.75e+06 |
| 0.327 | 2.644e+06 | 3.82e+06 | 0.692 | 0.489 | -4.86e+06 | 1.01e+07 |
| 0.328 | 2.422e+05 | 3.82e+06 | 0.063 | 0.949 | -7.26e+06 | 7.74e+06 |
| 0.329 | 6.709e+05 | 2.7e+06 | 0.248 | 0.804 | -4.63e+06 | 5.98e+06 |
| 0.333 | 7.369e+05 | 9e+05 | 0.818 | 0.413 | -1.03e+06 | 2.5e+06 |
| 0.336 | 4.5e+05 | 3.82e+06 | 0.118 | 0.906 | -7.05e+06 | 7.95e+06 |
| 0.338 | 2.23e+06 | 2.7e+06 | 0.826 | 0.409 | -3.07e+06 | 7.53e+06 |
| 0.339 | 1.951e+06 | 1.91e+06 | 1.021 | 0.307 | -1.8e+06 | 5.7e+06 |
| 0.34 | 2.248e+06 | 2.21e+06 | 1.019 | 0.308 | -2.08e+06 | 6.58e+06 |
| 0.341 | 1.489e+06 | 1.56e+06 | 0.955 | 0.340 | -1.57e+06 | 4.55e+06 |
| 0.342 | 5.764e+05 | 3.82e+06 | 0.151 | 0.880 | -6.92e+06 | 8.08e+06 |
| 0.343 | 4.3e+06 | 3.82e+06 | 1.126 | 0.261 | -3.2e+06 | 1.18e+07 |
| 0.344 | 1.015e+06 | 2.21e+06 | 0.460 | 0.645 | -3.32e+06 | 5.35e+06 |
| 0.347 | 5.287e+05 | 2.7e+06 | 0.196 | 0.845 | -4.78e+06 | 5.83e+06 |
| 0.348 | 2.317e+06 | 2.21e+06 | 1.050 | 0.294 | -2.01e+06 | 6.65e+06 |
| 0.349 | 2.329e+06 | 2.7e+06 | 0.862 | 0.389 | -2.98e+06 | 7.63e+06 |
| 0.35 | 1.89e+06 | 1.91e+06 | 0.990 | 0.323 | -1.86e+06 | 5.64e+06 |
| 0.351 | 3.112e+06 | 2.21e+06 | 1.411 | 0.159 | -1.22e+06 | 7.44e+06 |
| 0.352 | 4.921e+06 | 2.21e+06 | 2.231 | 0.026 | 5.9e+05 | 9.25e+06 |
| 0.353 | 1.154e+06 | 2.7e+06 | 0.427 | 0.669 | -4.15e+06 | 6.46e+06 |
| 0.354 | 4.131e+05 | 1.71e+06 | 0.242 | 0.809 | -2.94e+06 | 3.77e+06 |
| 0.355 | 3.495e+05 | 3.82e+06 | 0.091 | 0.927 | -7.15e+06 | 7.85e+06 |
| 0.356 | 5.324e+06 | 3.82e+06 | 1.394 | 0.164 | -2.18e+06 | 1.28e+07 |
| 0.357 | 9.484e+05 | 2.21e+06 | 0.430 | 0.667 | -3.38e+06 | 5.28e+06 |
| 0.358 | 2.5e+07 | 3.82e+06 | 6.544 | 0.000 | 1.75e+07 | 3.25e+07 |
| 0.359 | 4.453e+06 | 2.21e+06 | 2.019 | 0.044 | 1.22e+05 | 8.78e+06 |
| 0.36 | 9.473e+05 | 3.82e+06 | 0.248 | 0.804 | -6.55e+06 | 8.45e+06 |
| 0.361 | 6.83e+05 | 1.56e+06 | 0.438 | 0.662 | -2.38e+06 | 3.75e+06 |
| 0.364 | 1.279e+06 | 1.44e+06 | 0.886 | 0.376 | -1.56e+06 | 4.11e+06 |
| 0.365 | 3.417e+05 | 2.21e+06 | 0.155 | 0.877 | -3.99e+06 | 4.67e+06 |
| 0.366 | 2.089e+06 | 2.7e+06 | 0.773 | 0.440 | -3.22e+06 | 7.39e+06 |
| 0.367 | 8.057e+05 | 1.35e+06 | 0.597 | 0.551 | -1.85e+06 | 3.46e+06 |
| 0.368 | 2.407e+06 | 1.91e+06 | 1.260 | 0.208 | -1.34e+06 | 6.16e+06 |
| 0.369 | 1.976e+06 | 2.7e+06 | 0.731 | 0.465 | -3.33e+06 | 7.28e+06 |
| 0.37 | 8.858e+05 | 2.7e+06 | 0.328 | 0.743 | -4.42e+06 | 6.19e+06 |
| 0.371 | 9.819e+05 | 1.91e+06 | 0.514 | 0.607 | -2.77e+06 | 4.73e+06 |
| 0.372 | 1.568e+06 | 2.7e+06 | 0.580 | 0.562 | -3.74e+06 | 6.87e+06 |
| 0.373 | 4.164e+06 | 2.21e+06 | 1.888 | 0.059 | -1.66e+05 | 8.5e+06 |
| 0.374 | 7.313e+06 | 1.71e+06 | 4.281 | 0.000 | 3.96e+06 | 1.07e+07 |
| 0.375 | 1.944e+06 | 1.44e+06 | 1.346 | 0.179 | -8.92e+05 | 4.78e+06 |
| 0.376 | 3.685e+06 | 1.71e+06 | 2.157 | 0.031 | 3.3e+05 | 7.04e+06 |
| 0.377 | 3.135e+06 | 3.82e+06 | 0.821 | 0.412 | -4.37e+06 | 1.06e+07 |
| 0.378 | 5.375e+05 | 2.7e+06 | 0.199 | 0.842 | -4.77e+06 | 5.84e+06 |
| 0.379 | 7.331e+05 | 1.91e+06 | 0.384 | 0.701 | -3.02e+06 | 4.48e+06 |
| 0.38 | 6.667e+06 | 2.21e+06 | 3.023 | 0.003 | 2.34e+06 | 1.1e+07 |
| 0.381 | 4.883e+05 | 2.21e+06 | 0.221 | 0.825 | -3.84e+06 | 4.82e+06 |
| 0.382 | 8.974e+05 | 1.71e+06 | 0.525 | 0.600 | -2.46e+06 | 4.25e+06 |
| 0.383 | 8.563e+06 | 2.7e+06 | 3.170 | 0.002 | 3.26e+06 | 1.39e+07 |
| 0.384 | 2.715e+06 | 2.7e+06 | 1.005 | 0.315 | -2.59e+06 | 8.02e+06 |
| 0.385 | 1.962e+06 | 1.35e+06 | 1.453 | 0.147 | -6.9e+05 | 4.61e+06 |
| 0.386 | 3.205e+06 | 1.71e+06 | 1.876 | 0.061 | -1.49e+05 | 6.56e+06 |
| 0.387 | 7.55e+06 | 1.71e+06 | 4.419 | 0.000 | 4.19e+06 | 1.09e+07 |
| 0.389 | 1.178e+06 | 1.91e+06 | 0.617 | 0.538 | -2.57e+06 | 4.93e+06 |
| 0.39 | 2.551e+06 | 1.91e+06 | 1.336 | 0.182 | -1.2e+06 | 6.3e+06 |
| 0.391 | 2.364e+06 | 1.91e+06 | 1.238 | 0.216 | -1.39e+06 | 6.11e+06 |
| 0.392 | 2.434e+06 | 1.91e+06 | 1.274 | 0.203 | -1.32e+06 | 6.18e+06 |
| 0.393 | 6.828e+05 | 2.21e+06 | 0.310 | 0.757 | -3.65e+06 | 5.01e+06 |
| 0.394 | 1.221e+06 | 1.56e+06 | 0.783 | 0.434 | -1.84e+06 | 4.28e+06 |
| 0.395 | 1.627e+06 | 2.21e+06 | 0.738 | 0.461 | -2.7e+06 | 5.96e+06 |
| 0.396 | 1.398e+06 | 1.91e+06 | 0.732 | 0.465 | -2.35e+06 | 5.15e+06 |
| 0.397 | 7.248e+05 | 2.21e+06 | 0.329 | 0.743 | -3.61e+06 | 5.06e+06 |
| 0.398 | 3.234e+06 | 1.91e+06 | 1.693 | 0.091 | -5.17e+05 | 6.98e+06 |
| 0.399 | 2.67e+06 | 2.7e+06 | 0.988 | 0.323 | -2.63e+06 | 7.97e+06 |
| 0.4 | 2.349e+06 | 8.34e+05 | 2.818 | 0.005 | 7.12e+05 | 3.99e+06 |
| 0.401 | 3.411e+06 | 1.71e+06 | 1.996 | 0.046 | 5.6e+04 | 6.77e+06 |
| 0.402 | 2.5e+06 | 3.82e+06 | 0.654 | 0.513 | -5e+06 | 1e+07 |
| 0.403 | 8.39e+06 | 1.91e+06 | 4.392 | 0.000 | 4.64e+06 | 1.21e+07 |
| 0.404 | 1.519e+06 | 1.56e+06 | 0.974 | 0.331 | -1.54e+06 | 4.58e+06 |
| 0.405 | 1.969e+06 | 1.56e+06 | 1.263 | 0.207 | -1.09e+06 | 5.03e+06 |
| 0.406 | 1.077e+06 | 1.71e+06 | 0.631 | 0.529 | -2.28e+06 | 4.43e+06 |
| 0.407 | 2.616e+06 | 1.71e+06 | 1.531 | 0.126 | -7.39e+05 | 5.97e+06 |
| 0.408 | 9.027e+05 | 1.71e+06 | 0.528 | 0.597 | -2.45e+06 | 4.26e+06 |
| 0.409 | 5.388e+06 | 1.91e+06 | 2.821 | 0.005 | 1.64e+06 | 9.14e+06 |
| 0.41 | 5.225e+06 | 1.35e+06 | 3.868 | 0.000 | 2.57e+06 | 7.88e+06 |
| 0.411 | 3.801e+06 | 1.91e+06 | 1.990 | 0.047 | 5.08e+04 | 7.55e+06 |
| 0.412 | 3.726e+06 | 1.56e+06 | 2.389 | 0.017 | 6.64e+05 | 6.79e+06 |
| 0.413 | 3.311e+06 | 1.35e+06 | 2.451 | 0.014 | 6.59e+05 | 5.96e+06 |
| 0.414 | 1.012e+06 | 1.71e+06 | 0.592 | 0.554 | -2.34e+06 | 4.37e+06 |
| 0.415 | 6.176e+06 | 1.44e+06 | 4.277 | 0.000 | 3.34e+06 | 9.01e+06 |
| 0.416 | 4.54e+06 | 1.71e+06 | 2.657 | 0.008 | 1.19e+06 | 7.89e+06 |
| 0.417 | 3.015e+06 | 1.44e+06 | 2.088 | 0.037 | 1.8e+05 | 5.85e+06 |
| 0.418 | 1.016e+07 | 2.7e+06 | 3.762 | 0.000 | 4.86e+06 | 1.55e+07 |
| 0.419 | 4.639e+06 | 1.44e+06 | 3.213 | 0.001 | 1.8e+06 | 7.47e+06 |
| 0.42 | 1.844e+06 | 2.21e+06 | 0.836 | 0.403 | -2.49e+06 | 6.18e+06 |
| 0.421 | 3.884e+06 | 1.27e+06 | 3.050 | 0.002 | 1.38e+06 | 6.38e+06 |
| 0.422 | 2.915e+06 | 1.44e+06 | 2.019 | 0.044 | 7.98e+04 | 5.75e+06 |
| 0.423 | 3.001e+06 | 1.71e+06 | 1.756 | 0.079 | -3.54e+05 | 6.36e+06 |
| 0.424 | 6.234e+06 | 2.21e+06 | 2.826 | 0.005 | 1.9e+06 | 1.06e+07 |
| 0.425 | 2.122e+06 | 2.7e+06 | 0.785 | 0.433 | -3.18e+06 | 7.43e+06 |
| 0.426 | 5.971e+06 | 2.21e+06 | 2.707 | 0.007 | 1.64e+06 | 1.03e+07 |
| 0.427 | 6.412e+06 | 1.44e+06 | 4.441 | 0.000 | 3.58e+06 | 9.25e+06 |
| 0.428 | 4.166e+06 | 2.7e+06 | 1.542 | 0.123 | -1.14e+06 | 9.47e+06 |
| 0.429 | 3.071e+06 | 1.1e+06 | 2.784 | 0.006 | 9.05e+05 | 5.24e+06 |
| 0.43 | 2.829e+06 | 2.21e+06 | 1.282 | 0.200 | -1.5e+06 | 7.16e+06 |
| 0.431 | 3.542e+06 | 1.35e+06 | 2.622 | 0.009 | 8.9e+05 | 6.19e+06 |
| 0.432 | 3.399e+06 | 1.56e+06 | 2.179 | 0.030 | 3.36e+05 | 6.46e+06 |
| 0.433 | 4.77e+06 | 1.15e+06 | 4.141 | 0.000 | 2.51e+06 | 7.03e+06 |
| 0.434 | 8.5e+06 | 2.21e+06 | 3.854 | 0.000 | 4.17e+06 | 1.28e+07 |
| 0.435 | 3.873e+06 | 1.71e+06 | 2.267 | 0.024 | 5.19e+05 | 7.23e+06 |
| 0.436 | 6.646e+06 | 1.44e+06 | 4.603 | 0.000 | 3.81e+06 | 9.48e+06 |
| 0.437 | 7.359e+06 | 2.21e+06 | 3.336 | 0.001 | 3.03e+06 | 1.17e+07 |
| 0.438 | 3.859e+06 | 1.56e+06 | 2.474 | 0.014 | 7.97e+05 | 6.92e+06 |
| 0.439 | 7.579e+06 | 1.71e+06 | 4.436 | 0.000 | 4.22e+06 | 1.09e+07 |
| 0.44 | 1.244e+06 | 1.35e+06 | 0.921 | 0.357 | -1.41e+06 | 3.9e+06 |
| 0.441 | 5.579e+06 | 1.35e+06 | 4.131 | 0.000 | 2.93e+06 | 8.23e+06 |
| 0.442 | 1.128e+06 | 1.71e+06 | 0.660 | 0.509 | -2.23e+06 | 4.48e+06 |
| 0.443 | 2.672e+06 | 1.91e+06 | 1.399 | 0.162 | -1.08e+06 | 6.42e+06 |
| 0.444 | 9.753e+05 | 1.56e+06 | 0.625 | 0.532 | -2.09e+06 | 4.04e+06 |
| 0.445 | 3.241e+06 | 1.71e+06 | 1.897 | 0.058 | -1.14e+05 | 6.6e+06 |
| 0.446 | 4.363e+06 | 1.27e+06 | 3.426 | 0.001 | 1.86e+06 | 6.86e+06 |
| 0.447 | 2.558e+06 | 1.27e+06 | 2.009 | 0.045 | 5.74e+04 | 5.06e+06 |
| 0.448 | 4.872e+06 | 1.35e+06 | 3.607 | 0.000 | 2.22e+06 | 7.52e+06 |
| 0.449 | 2.328e+06 | 1.56e+06 | 1.493 | 0.136 | -7.35e+05 | 5.39e+06 |
| 0.45 | 5.308e+06 | 1.71e+06 | 3.107 | 0.002 | 1.95e+06 | 8.66e+06 |
| 0.451 | 7.935e+06 | 1.56e+06 | 5.088 | 0.000 | 4.87e+06 | 1.1e+07 |
| 0.452 | 3.434e+06 | 1.06e+06 | 3.241 | 0.001 | 1.35e+06 | 5.51e+06 |
| 0.453 | 5.93e+06 | 1.71e+06 | 3.471 | 0.001 | 2.58e+06 | 9.28e+06 |
| 0.454 | 1.064e+07 | 1.71e+06 | 6.230 | 0.000 | 7.29e+06 | 1.4e+07 |
| 0.455 | 5.393e+06 | 1.56e+06 | 3.458 | 0.001 | 2.33e+06 | 8.46e+06 |
| 0.456 | 1.291e+07 | 2.21e+06 | 5.853 | 0.000 | 8.58e+06 | 1.72e+07 |
| 0.457 | 1.967e+06 | 2.21e+06 | 0.892 | 0.373 | -2.36e+06 | 6.3e+06 |
| 0.458 | 6.226e+06 | 1.35e+06 | 4.610 | 0.000 | 3.57e+06 | 8.88e+06 |
| 0.459 | 2.959e+06 | 1.56e+06 | 1.897 | 0.058 | -1.04e+05 | 6.02e+06 |
| 0.46 | 5.928e+05 | 2.7e+06 | 0.219 | 0.826 | -4.71e+06 | 5.9e+06 |
| 0.461 | 6.453e+06 | 3.82e+06 | 1.689 | 0.092 | -1.05e+06 | 1.4e+07 |
| 0.462 | 5.415e+06 | 1.71e+06 | 3.170 | 0.002 | 2.06e+06 | 8.77e+06 |
| 0.463 | 1.335e+06 | 2.21e+06 | 0.605 | 0.545 | -3e+06 | 5.67e+06 |
| 0.464 | 7.666e+06 | 1.91e+06 | 4.013 | 0.000 | 3.92e+06 | 1.14e+07 |
| 0.465 | 3.681e+06 | 1.56e+06 | 2.360 | 0.019 | 6.18e+05 | 6.74e+06 |
| 0.466 | 6.754e+06 | 1.27e+06 | 5.304 | 0.000 | 4.25e+06 | 9.25e+06 |
| 0.467 | 5.515e+06 | 1.71e+06 | 3.228 | 0.001 | 2.16e+06 | 8.87e+06 |
| 0.468 | 2.403e+06 | 1.91e+06 | 1.258 | 0.209 | -1.35e+06 | 6.15e+06 |
| 0.469 | 4.518e+06 | 2.7e+06 | 1.673 | 0.095 | -7.86e+05 | 9.82e+06 |
| 0.47 | 1.062e+07 | 2.7e+06 | 3.933 | 0.000 | 5.32e+06 | 1.59e+07 |
| 0.471 | 4.274e+06 | 1.56e+06 | 2.741 | 0.006 | 1.21e+06 | 7.34e+06 |
| 0.472 | 1.084e+07 | 3.82e+06 | 2.838 | 0.005 | 3.34e+06 | 1.83e+07 |
| 0.473 | 4.797e+06 | 1.91e+06 | 2.511 | 0.012 | 1.05e+06 | 8.55e+06 |
| 0.476 | 4.33e+06 | 1.35e+06 | 3.206 | 0.001 | 1.68e+06 | 6.98e+06 |
| 0.477 | 8.11e+06 | 2.7e+06 | 3.002 | 0.003 | 2.81e+06 | 1.34e+07 |
| 0.478 | 1.052e+06 | 2.21e+06 | 0.477 | 0.634 | -3.28e+06 | 5.38e+06 |
| 0.479 | 1.235e+07 | 3.82e+06 | 3.233 | 0.001 | 4.85e+06 | 1.99e+07 |
| 0.48 | 4.708e+06 | 2.7e+06 | 1.743 | 0.082 | -5.96e+05 | 1e+07 |
| 0.481 | 1.293e+06 | 2.7e+06 | 0.479 | 0.632 | -4.01e+06 | 6.6e+06 |
| 0.482 | 5e+06 | 3.82e+06 | 1.309 | 0.191 | -2.5e+06 | 1.25e+07 |
| 0.483 | 2.136e+06 | 2.7e+06 | 0.791 | 0.429 | -3.17e+06 | 7.44e+06 |
| 0.485 | 4.895e+06 | 2.21e+06 | 2.219 | 0.027 | 5.64e+05 | 9.23e+06 |
| 0.486 | 1.769e+07 | 3.82e+06 | 4.630 | 0.000 | 1.02e+07 | 2.52e+07 |
| 0.487 | 6.395e+06 | 2.21e+06 | 2.900 | 0.004 | 2.06e+06 | 1.07e+07 |
| 0.488 | 5.064e+06 | 2.7e+06 | 1.875 | 0.061 | -2.4e+05 | 1.04e+07 |
| 0.489 | 6.875e+05 | 3.82e+06 | 0.180 | 0.857 | -6.81e+06 | 8.19e+06 |
| 0.49 | 7.363e+06 | 2.21e+06 | 3.338 | 0.001 | 3.03e+06 | 1.17e+07 |
| 0.491 | 5.242e+06 | 2.7e+06 | 1.941 | 0.053 | -6.22e+04 | 1.05e+07 |
| 0.492 | 1.536e+07 | 3.82e+06 | 4.021 | 0.000 | 7.86e+06 | 2.29e+07 |
| 0.493 | 6.301e+06 | 1.71e+06 | 3.688 | 0.000 | 2.95e+06 | 9.66e+06 |
| 0.495 | 1.703e+06 | 2.21e+06 | 0.772 | 0.440 | -2.63e+06 | 6.03e+06 |
| 0.496 | 3.345e+06 | 2.7e+06 | 1.238 | 0.216 | -1.96e+06 | 8.65e+06 |
| 0.498 | 2.358e+06 | 3.82e+06 | 0.617 | 0.537 | -5.14e+06 | 9.86e+06 |
| 0.499 | 1.891e+07 | 3.82e+06 | 4.949 | 0.000 | 1.14e+07 | 2.64e+07 |
| 0.5 | 1.891e+06 | 9.27e+05 | 2.041 | 0.042 | 7.19e+04 | 3.71e+06 |
| 0.501 | 2.041e+06 | 3.82e+06 | 0.534 | 0.593 | -5.46e+06 | 9.54e+06 |
| 0.504 | 5.787e+06 | 1.91e+06 | 3.030 | 0.003 | 2.04e+06 | 9.54e+06 |
| 0.505 | 1.205e+07 | 2.21e+06 | 5.465 | 0.000 | 7.72e+06 | 1.64e+07 |
| 0.506 | 6.77e+06 | 2.21e+06 | 3.070 | 0.002 | 2.44e+06 | 1.11e+07 |
| 0.507 | 2.832e+06 | 2.21e+06 | 1.284 | 0.200 | -1.5e+06 | 7.16e+06 |
| 0.508 | 2.086e+06 | 1.91e+06 | 1.092 | 0.275 | -1.66e+06 | 5.84e+06 |
| 0.509 | 1.499e+06 | 3.82e+06 | 0.392 | 0.695 | -6e+06 | 9e+06 |
| 0.51 | 6.875e+06 | 2.7e+06 | 2.545 | 0.011 | 1.57e+06 | 1.22e+07 |
| 0.511 | 1.276e+07 | 2.7e+06 | 4.723 | 0.000 | 7.45e+06 | 1.81e+07 |
| 0.512 | 1.3e+06 | 3.82e+06 | 0.340 | 0.734 | -6.2e+06 | 8.8e+06 |
| 0.513 | 3.422e+06 | 1.71e+06 | 2.003 | 0.046 | 6.72e+04 | 6.78e+06 |
| 0.514 | 4.623e+06 | 2.21e+06 | 2.096 | 0.036 | 2.92e+05 | 8.95e+06 |
| 0.515 | 1.193e+07 | 3.82e+06 | 3.124 | 0.002 | 4.43e+06 | 1.94e+07 |
| 0.516 | 3.7e+06 | 3.82e+06 | 0.969 | 0.333 | -3.8e+06 | 1.12e+07 |
| 0.517 | 6.363e+05 | 2.7e+06 | 0.236 | 0.814 | -4.67e+06 | 5.94e+06 |
| 0.518 | 6.508e+05 | 2.7e+06 | 0.241 | 0.810 | -4.65e+06 | 5.95e+06 |
| 0.52 | 2.297e+07 | 3.82e+06 | 6.013 | 0.000 | 1.55e+07 | 3.05e+07 |
| 0.521 | 3.436e+06 | 2.21e+06 | 1.558 | 0.120 | -8.94e+05 | 7.77e+06 |
| 0.522 | 1.284e+06 | 1.91e+06 | 0.672 | 0.502 | -2.47e+06 | 5.03e+06 |
| 0.523 | 8.5e+06 | 3.82e+06 | 2.225 | 0.026 | 9.99e+05 | 1.6e+07 |
| 0.524 | 1.229e+06 | 2.21e+06 | 0.557 | 0.578 | -3.1e+06 | 5.56e+06 |
| 0.525 | 2.057e+06 | 3.82e+06 | 0.538 | 0.590 | -5.44e+06 | 9.56e+06 |
| 0.526 | 3.152e+06 | 2.21e+06 | 1.429 | 0.153 | -1.18e+06 | 7.48e+06 |
| 0.527 | 7.622e+05 | 3.82e+06 | 0.200 | 0.842 | -6.74e+06 | 8.26e+06 |
| 0.528 | 1.679e+06 | 3.82e+06 | 0.439 | 0.660 | -5.82e+06 | 9.18e+06 |
| 0.529 | 1.981e+06 | 1.91e+06 | 1.037 | 0.300 | -1.77e+06 | 5.73e+06 |
| 0.531 | 4.538e+06 | 3.82e+06 | 1.188 | 0.235 | -2.96e+06 | 1.2e+07 |
| 0.532 | 1.036e+06 | 2.21e+06 | 0.470 | 0.639 | -3.29e+06 | 5.37e+06 |
| 0.533 | 5.854e+06 | 3.82e+06 | 1.532 | 0.126 | -1.65e+06 | 1.34e+07 |
| 0.534 | 2.924e+06 | 2.7e+06 | 1.082 | 0.280 | -2.38e+06 | 8.23e+06 |
| 0.536 | 3.248e+06 | 3.82e+06 | 0.850 | 0.396 | -4.25e+06 | 1.07e+07 |
| 0.537 | 3.004e+06 | 2.7e+06 | 1.112 | 0.266 | -2.3e+06 | 8.31e+06 |
| 0.538 | 1.163e+06 | 1.71e+06 | 0.681 | 0.496 | -2.19e+06 | 4.52e+06 |
| 0.539 | 4e+06 | 3.82e+06 | 1.047 | 0.295 | -3.5e+06 | 1.15e+07 |
| 0.54 | 4e+06 | 3.82e+06 | 1.047 | 0.295 | -3.5e+06 | 1.15e+07 |
| 0.541 | 6.331e+06 | 3.82e+06 | 1.657 | 0.098 | -1.17e+06 | 1.38e+07 |
| 0.542 | 3.402e+06 | 2.7e+06 | 1.259 | 0.208 | -1.9e+06 | 8.71e+06 |
| 0.543 | 2.026e+06 | 1.91e+06 | 1.061 | 0.289 | -1.72e+06 | 5.78e+06 |
| 0.545 | 5.861e+05 | 3.82e+06 | 0.153 | 0.878 | -6.92e+06 | 8.09e+06 |
| 0.548 | 2.009e+06 | 3.82e+06 | 0.526 | 0.599 | -5.49e+06 | 9.51e+06 |
| 0.551 | 7.877e+05 | 2.7e+06 | 0.292 | 0.771 | -4.52e+06 | 6.09e+06 |
| 0.552 | 1.582e+06 | 2.7e+06 | 0.586 | 0.558 | -3.72e+06 | 6.89e+06 |
| 0.553 | 2.3e+06 | 3.82e+06 | 0.602 | 0.547 | -5.2e+06 | 9.8e+06 |
| 0.555 | 9.705e+06 | 3.82e+06 | 2.540 | 0.011 | 2.2e+06 | 1.72e+07 |
| 0.556 | 5.272e+05 | 2.7e+06 | 0.195 | 0.845 | -4.78e+06 | 5.83e+06 |
| 0.558 | 1.307e+07 | 2.7e+06 | 4.838 | 0.000 | 7.76e+06 | 1.84e+07 |
| 0.559 | 1.176e+06 | 3.82e+06 | 0.308 | 0.758 | -6.33e+06 | 8.68e+06 |
| 0.561 | 5.15e+06 | 2.7e+06 | 1.906 | 0.057 | -1.54e+05 | 1.05e+07 |
| 0.563 | 3.398e+06 | 3.82e+06 | 0.890 | 0.374 | -4.1e+06 | 1.09e+07 |
| 0.564 | 2.943e+06 | 3.82e+06 | 0.770 | 0.441 | -4.56e+06 | 1.04e+07 |
| 0.565 | 7.262e+06 | 2.21e+06 | 3.292 | 0.001 | 2.93e+06 | 1.16e+07 |
| 0.568 | 2.213e+06 | 2.7e+06 | 0.819 | 0.413 | -3.09e+06 | 7.52e+06 |
| 0.571 | 1.608e+06 | 2.21e+06 | 0.729 | 0.466 | -2.72e+06 | 5.94e+06 |
| 0.573 | 1.357e+06 | 3.82e+06 | 0.355 | 0.723 | -6.14e+06 | 8.86e+06 |
| 0.575 | 1.271e+06 | 3.82e+06 | 0.333 | 0.739 | -6.23e+06 | 8.77e+06 |
| 0.576 | 1.64e+07 | 3.82e+06 | 4.293 | 0.000 | 8.9e+06 | 2.39e+07 |
| 0.581 | 1.376e+06 | 2.21e+06 | 0.624 | 0.533 | -2.95e+06 | 5.71e+06 |
| 0.582 | 2.691e+06 | 2.7e+06 | 0.996 | 0.319 | -2.61e+06 | 8e+06 |
| 0.585 | 1.2e+07 | 3.82e+06 | 3.141 | 0.002 | 4.5e+06 | 1.95e+07 |
| 0.588 | 1.426e+07 | 3.82e+06 | 3.733 | 0.000 | 6.76e+06 | 2.18e+07 |
| 0.596 | 1.101e+06 | 3.82e+06 | 0.288 | 0.773 | -6.4e+06 | 8.6e+06 |
| 0.6 | 3.221e+06 | 2.7e+06 | 1.192 | 0.234 | -2.08e+06 | 8.53e+06 |
| 0.601 | 9.811e+05 | 3.82e+06 | 0.257 | 0.797 | -6.52e+06 | 8.48e+06 |
| 0.602 | 4.25e+06 | 3.82e+06 | 1.112 | 0.266 | -3.25e+06 | 1.18e+07 |
| 0.604 | 7.976e+05 | 3.82e+06 | 0.209 | 0.835 | -6.7e+06 | 8.3e+06 |
| 0.605 | 4.238e+05 | 3.82e+06 | 0.111 | 0.912 | -7.08e+06 | 7.92e+06 |
| 0.606 | 9.813e+05 | 3.82e+06 | 0.257 | 0.797 | -6.52e+06 | 8.48e+06 |
| 0.613 | 2.279e+06 | 3.82e+06 | 0.597 | 0.551 | -5.22e+06 | 9.78e+06 |
| 0.62 | 2.236e+07 | 3.82e+06 | 5.853 | 0.000 | 1.49e+07 | 2.99e+07 |
| 0.627 | 1.2e+07 | 3.82e+06 | 3.141 | 0.002 | 4.5e+06 | 1.95e+07 |
| 0.636 | 1.842e+06 | 3.82e+06 | 0.482 | 0.630 | -5.66e+06 | 9.34e+06 |
| 0.644 | 1e+06 | 3.82e+06 | 0.262 | 0.794 | -6.5e+06 | 8.5e+06 |
| 0.65 | 8.936e+04 | 3.82e+06 | 0.023 | 0.981 | -7.41e+06 | 7.59e+06 |
| 0.667 | 7.077e+05 | 2.21e+06 | 0.321 | 0.748 | -3.62e+06 | 5.04e+06 |
| 0.703 | 1.95e+07 | 3.82e+06 | 5.104 | 0.000 | 1.2e+07 | 2.7e+07 |
| 0.714 | 2.252e+06 | 3.82e+06 | 0.590 | 0.556 | -5.25e+06 | 9.75e+06 |
| 0.719 | 1.352e+06 | 3.82e+06 | 0.354 | 0.723 | -6.15e+06 | 8.85e+06 |
| 0.75 | 3.06e+05 | 2.21e+06 | 0.139 | 0.890 | -4.02e+06 | 4.64e+06 |
| 0.8 | 3.825e+06 | 2.7e+06 | 1.416 | 0.157 | -1.48e+06 | 9.13e+06 |
| 1.0 | 2.548e+06 | 1.71e+06 | 1.492 | 0.136 | -8.06e+05 | 5.9e+06 |
| Omnibus: | 273.664 | Durbin-Watson: | 1.746 |
|---|---|---|---|
| Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 982.764 |
| Skew: | 1.347 | Prob(JB): | 3.94e-214 |
| Kurtosis: | 7.179 | Cond. No. | 4.58 |
lm.summary2().tables[1]['P>|t|']
0.000 0.531865
0.083 0.648571
0.100 0.957740
0.118 0.739386
0.125 0.833231
...
0.714 0.555662
0.719 0.723487
0.750 0.889694
0.800 0.157251
1.000 0.136271
Name: P>|t|, Length: 288, dtype: float64
As seen from the results of the fitting above, the R-squared value of 0.661 indicates that there is a moderate linear correlation between field goal percentage and salary. This suggests that, in general, using this statistic as a predictor of NBA income may not be the best option.
Relationship between Experience and Salary
Finally, why don't we visit another one of the previous claims mentioned and test the results of our ANOVA test and OLS Regression.
Our null hypothesis will be that there is no difference between the true salaries for all the populations of experiences, while our alternative is that there is at least one population that differs.
# Convert the independent variable to a numeric data type using one-hot encoding
X = pd.get_dummies(salaries["Experience"])
# Extract the dependent variable from the dataframe
y = salaries["Salary"]
# Fit the linear regression model using OLS
lm = sm.OLS(y, X).fit()
lm.summary()
| Dep. Variable: | Salary | R-squared: | 0.358 |
|---|---|---|---|
| Model: | OLS | Adj. R-squared: | 0.343 |
| Method: | Least Squares | F-statistic: | 24.74 |
| Date: | Sat, 17 Dec 2022 | Prob (F-statistic): | 2.92e-75 |
| Time: | 01:05:36 | Log-Likelihood: | -15719. |
| No. Observations: | 954 | AIC: | 3.148e+04 |
| Df Residuals: | 932 | BIC: | 3.159e+04 |
| Df Model: | 21 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| -10 | 9.2e+06 | 3.51e+06 | 2.625 | 0.009 | 2.32e+06 | 1.61e+07 |
| -8 | 9.25e+06 | 2.48e+06 | 3.732 | 0.000 | 4.39e+06 | 1.41e+07 |
| 1 | 1.025e+06 | 2.75e+05 | 3.723 | 0.000 | 4.85e+05 | 1.57e+06 |
| 2 | 1.134e+06 | 2.77e+05 | 4.092 | 0.000 | 5.9e+05 | 1.68e+06 |
| 3 | 1.46e+06 | 2.97e+05 | 4.912 | 0.000 | 8.77e+05 | 2.04e+06 |
| 4 | 2.43e+06 | 3.72e+05 | 6.540 | 0.000 | 1.7e+06 | 3.16e+06 |
| 5 | 5.046e+06 | 3.99e+05 | 12.631 | 0.000 | 4.26e+06 | 5.83e+06 |
| 6 | 4.41e+06 | 4.68e+05 | 9.415 | 0.000 | 3.49e+06 | 5.33e+06 |
| 7 | 5.634e+06 | 4.64e+05 | 12.135 | 0.000 | 4.72e+06 | 6.55e+06 |
| 8 | 8.429e+06 | 5.47e+05 | 15.396 | 0.000 | 7.35e+06 | 9.5e+06 |
| 9 | 6.717e+06 | 4.96e+05 | 13.549 | 0.000 | 5.74e+06 | 7.69e+06 |
| 10 | 5.007e+06 | 6.01e+05 | 8.329 | 0.000 | 3.83e+06 | 6.19e+06 |
| 11 | 7.674e+06 | 6.87e+05 | 11.163 | 0.000 | 6.33e+06 | 9.02e+06 |
| 12 | 7.621e+06 | 8.26e+05 | 9.224 | 0.000 | 6e+06 | 9.24e+06 |
| 13 | 7.261e+06 | 8.26e+05 | 8.788 | 0.000 | 5.64e+06 | 8.88e+06 |
| 14 | 8.671e+06 | 1.75e+06 | 4.947 | 0.000 | 5.23e+06 | 1.21e+07 |
| 15 | 5.475e+06 | 1.17e+06 | 4.686 | 0.000 | 3.18e+06 | 7.77e+06 |
| 16 | 3.654e+06 | 1.43e+06 | 2.553 | 0.011 | 8.46e+05 | 6.46e+06 |
| 17 | 1.498e+06 | 2.48e+06 | 0.605 | 0.546 | -3.37e+06 | 6.36e+06 |
| 18 | 8.333e+06 | 3.51e+06 | 2.377 | 0.018 | 1.45e+06 | 1.52e+07 |
| 19 | 5e+06 | 3.51e+06 | 1.426 | 0.154 | -1.88e+06 | 1.19e+07 |
| 20 | 2.5e+07 | 3.51e+06 | 7.132 | 0.000 | 1.81e+07 | 3.19e+07 |
| Omnibus: | 275.981 | Durbin-Watson: | 1.505 |
|---|---|---|---|
| Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 881.471 |
| Skew: | 1.406 | Prob(JB): | 3.90e-192 |
| Kurtosis: | 6.778 | Cond. No. | 12.7 |
lm.summary2().tables[1]['P>|t|']
-10 8.819450e-03 -8 2.016342e-04 1 2.088228e-04 2 4.646078e-05 3 1.065057e-06 4 1.009440e-10 5 7.159687e-34 6 3.586994e-20 7 1.436488e-31 8 7.962659e-48 9 2.625087e-38 10 2.889075e-16 11 3.028402e-27 12 1.863755e-19 13 7.239497e-18 14 8.939409e-07 15 3.201633e-06 16 1.082603e-02 17 5.456335e-01 18 1.764224e-02 19 1.540987e-01 20 1.986831e-12 Name: P>|t|, dtype: float64
As seen from the results of the fitting above, the R-squared value of 0.358 indicates that there is little to no linear correlation between experience and salary. This suggests that, in general, using this statistic as a predictor of NBA income may not be the best option.
The format of our data, as well as the nature of salary data itself, allowed us to demonstrate many parts of the data science pipeline.
Based on the discoveries we made throughout this tutorial, we think that NBA players that consistently perform well, will get paid more. This can be seen with a strong correlation between the two variables. As obvious as this result sounds, there are some results that were definitely unexpected.
When testing for something as simple as experience, one would expect the salary to increase over time; however, due to the complexity of salary information this was not the case. Our team discovered that the data showed this may not quite be the case. In an attempt to fit a linear model to the data, the coefficient was positive; however, the regression hardly captured the nature of the data, as the did not exceed 0.358 even with a simplified and cleaned dataset.
Another insight that we gained during this tutorial is that our original claim that there is in fact a difference between the salaries was actually false, as the test concluded with us failing to reject the null. However, we learning that there is a very weak correlation between the two variables with r-squared being 0.155.
A final insight that we learned throughout this tutorial is that Field Goal percentage is a moderate predictor of NBA salary, as the r-squared value is around 0.661.
For future steps, we would like to explore if it would be possible to fit more advanced models to salary with relationships that suggest statistical significance.
If we were to move further along with exploratory data analysis and hypothesis testing for this dataset, some of the next steps we would take would be answering questions based off of other observation characteristics. Some of the following questions we could investigate in the next steps are:
Ultimately, success in the NBA is not defined by compensation. However, it will always be an obvious incentive. Thus, an analysis like this is still important in the scope of the future of the NBA and NCAA.